Softmax with Loss Layer with Numpy

Riley Learning
3 min readJun 23, 2022

--

Softmax with Loss Layer

The softmax with loss layer is the layer that consists of softmax function and cross entropy loss function. The layer is used as the output layer extensively because deriving both softmax and cross entropy loss has less calculation. Let’s first see how the softmax function and cross entropy loss function works respectively.

Softmax Function

Softmax Function or normalized exponential function is a generalization of the logistic function to multiple dimensions.

How the Softmax Function Works

The softmax function takes as input a vector a of K real numbers, and normalizes it into a probability distribution consisting of K probabilities proportional to the exponentials of the input numbers. After applying softmax, each component will be in the interval (0,1).

Softmax Function

Cross Entropy Loss Function

Entropy of a random variable is the level of uncertainty inherent in the variables possible outcome. Cross Entropy Loss , or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1.

Cross Entropy Function

Categorical cross-entropy is used when true labels are one-hot encoded, for example, we have the following true values for 3-class classification problem [1,0,0], [0,1,0] and [0,0,1].

An Example of Cross-entropy Loss

In the case of model A and B, the performance of model A is better when the label is 0. When the first index of model A is 0.9, -logx is -log0.9 which has lower loss on the graph.

Softmax with Loss Layer

Softmax with loss layer combines the softmax function and cross entropy loss function.

Derivative of the cross-entropy loss function for the softmax function can be calculated as:

We can get a simple result, dx = y-t. The below is the graph of forward and backward pass of the softmax with loss layer.

We can translating it into code. The below is the softmax with loss layer.

I hope this article helped you understand the softmax with loss layer more clearly. Thanks for reading :)

--

--