Probabilistic Generative Models

Motivation. Inspiration is from Maximum Likelihood Estimators. Consider classifying into two classes, $C_{1}, C_{2}$ If we consider $w$ , the weights as a parameter is a probability distribution for class $C_{1}$ , classification output as such:

$p (C_{1} ∣ x)$ : given datapoint $x$ the probability (distribution) of it being in class $1$ . The pdf of this is what we want to find.
$p (x ∣ C_{1})$ : given that data is in class $1$ , what is the distribution of the datapoints? We know this from the data. We will construct the former from the latter.

From $x$ as a Multivariate Normal Distribution

If we have $x$ as a multivariate normal distribution, it has the probability distribution:

p (x ∣ C_{1}) = \frac{1}{( 2 π ) ^{\frac{d}{2}} ∣Σ ∣ ^{\frac{1}{2}}} e^{- \frac{1}{2} (X - μ_{1})^{T} Σ^{- 1} (X - μ_{1})}

where

$μ_{1}$ is the mean point of datapoints in class $1$
$Σ$ is the Covariance matrix for this distribution Both off these we do not know yet. Then using Bayes’ Rule:

p (C_{1} ∣ x) = \frac{p ( x ∣ C _{1} ) \cdot ( C _{1} )}{p ( x ∣ C _{1} ) p ( C _{1} ) + p ( x ∣ C _{2} ) p ( C 2 )} = \frac{exp [ ln p ( x ∣ C _{k} ) p ( C _{k} ) let a _{k} ]}{\sum _{\forall j} exp [ let a _{j} ln p ( x ∣ C _{j} ) p ( C _{j} ) ]} = σ (a_{k} ∣ a_{1}, \dots, a_{n}) by Bayes’ add exp & log for sigmoid by def of sigmoid

where

See ^unj2yy for the definition
Softmax ensures this is a probability distribution. Now, we calculate $a_{k}$ . Substitute the multivarate normal pdf into $a_{k}$ :

a_{k} = ln p (x ∣ C_{k}) p (C_{k}) = let C ln \frac{1}{( 2 π ) ^{D /2}} + ln \frac{1}{∣Σ ∣ ^{1/2}} - \frac{1}{2} (x^{⊤} - μ_{k}^{⊤}) Σ^{- 1} (x - μ_{k}) + ln p (C_{k}) = C - \frac{1}{2} x^{⊤} Σ^{- 1} x + \frac{1}{2} x^{⊤} Σ^{- 1} μ_{k} + \frac{1}{2} μ_{k}^{⊤} Σ^{- 1} x let w_{k_{0}} - \frac{1}{2} μ_{k}^{⊤} Σ^{- 1} μ_{k} + ln p (C_{k}) = C - \frac{1}{2} x^{⊤} Σ^{- 1} x + let w_{k} μ_{k}^{⊤} Σ^{- 1} x + w_{k_{0}} = w_{k}^{⊤} x + w_{k, 0} - \frac{1}{2} x^{⊤} Σ^{- 1} x + C Σ^{- 1} is symtrc

Finally, we plug this into our softmax function to get:

p (C_{k} ∣ x) = \frac{exp ( w _{k} ^{⊤} x + w _{k, 0} - \frac{1}{2} x ^{⊤} Σ ^{- 1} x + C )}{\sum _{\forall j} exp ( w _{j} ^{⊤} x + w _{j, 0} - \frac{1}{2} x ^{⊤} Σ ^{- 1} x + C )} = \frac{exp ( w _{k} ^{⊤} x + w _{k, 0} )}{\sum _{\forall j} exp ( w _{j}^{⊤} x + w _{j, 0} )}

PK's Notes

Explorer

Probabilistic Generative Models

From $x$ as a Multivariate Normal Distribution

Finding the Optimal Parameters $w$

Graph View

Table of Contents

Backlinks

PK's Notes

Explorer

Probabilistic Generative Models

From x as a Multivariate Normal Distribution §

Finding the Optimal Parameters w §

Graph View

Table of Contents

Backlinks

From $x$ as a Multivariate Normal Distribution

Finding the Optimal Parameters $w$