Motivation. Inspiration is from Maximum Likelihood Estimators. Consider classifying into two classes, If we consider , the weights as a parameter is a probability distribution for class , classification output as such:
- : given datapoint the probability (distribution) of it being in class . The pdf of this is what we want to find.
- : given that data is in class , what is the distribution of the datapoints? We know this from the data. We will construct the former from the latter.
From as a Multivariate Normal Distribution
If we have as a multivariate normal distribution, it has the probability distribution:
where
- is the mean point of datapoints in class
- is the Covariance matrix for this distribution Both off these we do not know yet. Then using Bayes’ Rule:
where
- See ^unj2yy for the definition
- Softmax ensures this is a probability distribution. Now, we calculate . Substitute the multivarate normal pdf into :
Finally, we plug this into our softmax function to get: