Intuition. In many cases, data is represented by a sum of commonly-existing components. For example, a number is simply a combination of pen strokes.
We formalize these properties into the following:
- Each component is a element of a common language, i.e. a dictionary
- Data is constructed from a weighted sum of dictionary items
- Often, only a few types of pen strokes are used for a single number. Thus we want to only choose (=weight) few components; i.e. enforce sparsity in latent representation
def. Sparse Coding & Dictionary Learning. Given a dictionary , let be a datapoint and be the corresponding latent representation obtained by:
i.e., the latent representation always minimizes the reconstruction loss. This is an odd way to define a latent representation, but it is useful in that given a dictionary always minmizes reconstruction loss—part of the training is built into the inference.
- We also enforce that the columns of is of norm , because if we don’t the sparsity objective can ‘cheat’ and increase the numbers in to compensate for the sparsity.
- This optimization is done by the Iterative Shrinkage and Thresholding Algorithm (ISTM) because the sparsity penalty makes it difficult to use standard tools. Objective. Now, how do we get ? Sometimes it’s obvious (e.g., sentences are composed of words) but in most cases not. We obtain optimal with the following objective: