Paper Note: VITS

Sticky

Posted on 2024-08-30 Edited on 2024-10-09 In Speech AI

Abstract

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech [1].

VITS aims to improve the performance of ene-to-end (single stage) TTS model, so that the quality of synthesized speech meets or exceeds that of two-stage systems.This paper is published at ICML 2021.

This note provides explanation and summary of VITS.

Normalizing Flow

Posted on 2024-10-03 Edited on 2024-10-09 In Mathematics

Abstract

Normalizing flow can convert simple probability densities (e.g. Gaussian Distribution) into some complex form of distribution. It can be used in generative model, reinforcement learning, variational inference, and so on. Flow means that the data “flow” through a series of bijection (invertible mapping) to map to the appropriate representation space. Normalizing means that the variables in the representation space integrate to $1$, satisfying the definition of the probability distribution function.

Variational Auto-Encoders

Posted on 2024-09-05 Edited on 2024-10-09 In Mathematics

Abstract

Variational Auto-Encoders (VAEs) are a type of generative model that combines probabilistic graphical models and neural networks. They are capable of learning latent variable representations of data and generating new data samples given some input data. VAEs approximate the log-likelihood by maximizing the variational lower bound, thus avoiding the direct maximization of potentially intractable objective functions.

Beamforming in Speech Processing

Posted on 2024-08-27 Edited on 2024-10-09

Abstract

Beamforming is a technique that enhances speech signal quality by directing the reception or transmission of sound waves, playing a crucial role in speech signal processing. Traditional beamforming algorithms, such as Minimum Variance Distortionless Response (MVDR), as well as recent deep learning-based methods like ADL-MVDR, have been widely applied. This article aims to introduce the principles, implementation, and advantages and disadvantages of various mainstream beamforming algorithms.

Text to Speech: A Review

Posted on 2024-08-26 Edited on 2024-10-09 In Speech AI

Something is coming…

Speech Enhancement: A Review

Posted on 2024-08-26 Edited on 2024-10-09 In Speech AI

Something is coming…

Knowledge Distillation for Speech Enhancement

Posted on 2024-08-25 Edited on 2024-10-09 In Speech AI

Something is coming…