Softmax function takes in a vector of dimension and normalizes it into a probability distribution with different outcomes:
Numerical Instability
When calculating softmax in floating point, when the values get big there are instabilities. Instead, observe:
Which means we can add any constant to prevent overflow/underflow and thus inf
s.