def. Sigmoid Function is a simple S-shaped function.

def. Softmax function takes in a vector of dimension and normalizes it into a probability distribution with different outcomes:

Sometimes if we input each separately we see it as a function that already knows :

and it is the single-element (scalar) , i.e. probability of class .

Numerical Instability

When calculating softmax in floating point, when the values get big there are instabilities. Instead, observe:

Which means we can add any constant to prevent overflow/underflow and thus infs.