Softmax with Loss Layer with Numpy
Softmax with Loss Layer
The softmax with loss layer is the layer that consists of softmax function and cross entropy loss function. The layer is used as the output layer extensively because deriving both softmax and cross entropy loss has less calculation. Let’s first see how the softmax function and cross entropy loss function works respectively.
Softmax Function
Softmax Function or normalized exponential function is a generalization of the logistic function to multiple dimensions.
The softmax function takes as input a vector a of K real numbers, and normalizes it into a probability distribution consisting of K probabilities proportional to the exponentials of the input numbers. After applying softmax, each component will be in the interval (0,1).
Cross Entropy Loss Function
Entropy of a random variable is the level of uncertainty inherent in the variables possible outcome. Cross Entropy Loss , or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1.
Categorical cross-entropy is used when true labels are one-hot encoded, for example, we have the following true values for 3-class classification problem [1,0,0]
, [0,1,0]
and [0,0,1].
In the case of model A and B, the performance of model A is better when the label is 0. When the first index of model A is 0.9, -logx is -log0.9 which has lower loss on the graph.
Softmax with Loss Layer
Softmax with loss layer combines the softmax function and cross entropy loss function.
Derivative of the cross-entropy loss function for the softmax function can be calculated as:
We can get a simple result, dx = y-t. The below is the graph of forward and backward pass of the softmax with loss layer.
We can translating it into code. The below is the softmax with loss layer.
I hope this article helped you understand the softmax with loss layer more clearly. Thanks for reading :)