Sparse Coding and Dictionary Learning

Intuition. In many cases, data is represented by a sum of commonly-existing components. For example, a number is simply a combination of pen strokes. We formalize these properties into the following:

Each component is a element of a common language, i.e. a dictionary
Data is constructed from a weighted sum of dictionary items
- Often, only a few types of pen strokes are used for a single number. Thus we want to only choose (=weight) few components; i.e. enforce sparsity in latent representation

def. Sparse Coding & Dictionary Learning. Given a dictionary $D$ , let $x^{(n)}$ be a datapoint and $h^{(n)} (x^{(n)})$ be the corresponding latent representation obtained by:

h^{(n)} := a r g h (x) min reconstruction loss \frac{1}{2} ∥ x - D \cdot h (x) reconstruction ∥_{2}^{2} + λ ∥ h (x) ∥_{1} sparsity penalty

i.e., the latent representation $h$ always minimizes the reconstruction loss. This is an odd way to define a latent representation, but it is useful in that given a dictionary $h$ always minmizes reconstruction loss—part of the training is built into the inference.

We also enforce that the columns of $D$ is of norm $1$ , because if we don’t the sparsity objective can ‘cheat’ and increase the numbers in $D$ to compensate for the sparsity.
This optimization is done by the Iterative Shrinkage and Thresholding Algorithm (ISTM) because the sparsity penalty makes it difficult to use standard tools. Objective. Now, how do we get $D$ ? Sometimes it’s obvious (e.g., sentences are composed of words) but in most cases not. We obtain optimal $D$ with the following objective:

L = D min ⟨ h (x) min \frac{1}{2} ∥ x - D \cdot h (x) ∥_{2}^{2} + λ ∥ h (x) ∥_{1} ⟩_{T}

Site Unreachable

PK's Notes

Explorer

Sparse Coding and Dictionary Learning

Graph View

Backlinks