def. Sigmoid Function is a simple S-shaped function.
def. Softmax function takes in a vector of dimension and normalizes it into a probability distribution with different outcomes:
Sometimes if we input each separately we see it as a function that already knows :
and it is the single-element (scalar) , i.e. probability of class .
Numerical Instability
When calculating softmax in floating point, when the values get big there are instabilities. Instead, observe:
Which means we can add any constant to prevent overflow/underflow and thus inf
s.