-
Notifications
You must be signed in to change notification settings - Fork 0
About forward pass
Let's take the following neural network:

We'll look specifically into how the neuron with a_4 (the first one from the top on the second layer) get's its activation calculated.
The input of a neuron is the sum of all the activations of neurons connected to it, multiplied by their respective weights. Since it's a fully connected layer, in this case the input for neuron a_4 contains the activations of all the neurons of the previous layer:
a_4 = w_1*a_1 + w_2*a_2 + w_3*a_3
You can see the substantive weights in red.
Rembember, in the most simple case the network receives a vector as an input. But since we need a lot of training data to achieve good results, it's important to be as time-efficient as possible. The training process can be optimized using batches, i.e. training several data points in one forward pass. In that case, the neural network receives a matrix as an input, with the rows representing the input vector for each batch.
So since we're dealing with sums and matrices, matrix multiplication is actually a very natural choice. By multiplying the input vector with the transpose of the weight vector, we're able to create a new vector, which has the exact correct sums for each neuron. Here's a demonstration on how the specific activation a_4 is formed in this case. Most weights and activations are omitted from the picture, since it would be non-readable otherwise:

Let's assume the batch size is 1. In this case, the network takes a matrix of shape (1, 3) as an input, since the input layer has 3 neurons. So now we have to multiply each neuron's activation with the correct weight to get the activation of the first neuron in the second layer. We start with a weight matrix of the shape (4, 3) since the next layer has 4 neurons, and this layer has 3. By transposing, we move the three weights we want to multiply, to the columns, and create a matrix of shape (3, 4). And voilà, now we can do a matrix multiplication, since their inner dimensions are the same. The resulting matrix will be of shape (1, 4), which we can see, is right, since it's (batch size, layer size), and we will be able to continue the multiplications with the next layer.