Variational Autoencoders

Motivation. Autoencoders have the problem of unstructured latent space, i.e. when sampling purely randomly in the latent space $h$ and decoding it to $x$ , the generated data $x$ is not very good—there are only a few ‘locations’ in the latent space that represents realistic data. To solve this, we must ‘structure’ the latent space such that sampling it would almost always give real data. The simplest idea would be to force the latent space to be a standard normal distribution.

def. Variational Autoencoder (VAE).¹² Let a probability distribution of data $p (x)$ and latent distribution of $p (z) = N (0, 1)$ .

The encoder encodes the data into variables $μ, Σ$ . This forms the conditional latent distribution, $p (z ∣ x^{observed}) = N (μ, Σ)$ .
- Note that the latent representation is not a vector like autoencoders, but instead a full probability distribution
The decoder then takes a sampled latent point $z^{realized}$ , and decodes it into the datapoint to get a distribution for the data $p (x ∣ z^{realized})$ . Objective. The loss, also known as Evidence Lower Bound (ELBO) can be shown to be as follows:

L () = E_{q (z ∣ x)} [ln p (x ∣ z)] L2 Norm - Make sure it becomes std. norm. D_{KL} [q (z ∣ x) N (μ_{q}, Σ_{q}) ∥ p (z) N (0, 1)]

where the terms: 3. Expectation term is equivalent to the L2 Norm between the original image $x^{observed}$ and the reconstruction $x^{reconstructed} \sim p (x ∣ z^{realized})$ 4. KL-divergence term is simply to make sure that $q (z ∣ x)$ resembles a standard normal distribution.

PK's Notes

Explorer

Variational Autoencoders

Graph View

Backlinks

PK's Notes

Explorer

Variational Autoencoders

Footnotes §

Graph View

Backlinks

Footnotes