I am training FFNN for MNIST with a batch size of 32. My first linear layer has 100 neurons, defined as nn.linear(784,100). When I check the shape of the layer using model.weight.shape, I get [100,784]. My input is of the shape [32,784]. It was my understanding that there are matrix multiplication Weights with the input, however, I cannot see how to do that between the weight tensor of shape [100,784] and input tensor of shape [32,784]. Shouldn’t the weight tensor be [784,100]? Or does PyTorch transpose the tensor before multiplication?
weight is transposed as seen e.g. here.
I have a follow-up question
The gradients of the parameters of the weights of for the above-mentioned layer
model.weight.grad, after J.backward() has been called is of the shape [100,784].
The shape of model.weight.grad is the transpose of the shape of the parameters of the layer that is used during forward propagation- model.weight
I am using SGD optimizer with just the learning rate hyperparameter to update the parameters. Am I correct to assume that the weights for the layer are updated as follows?
And this works out because PyTorch stores model.weight in the same shape as model.weight.grad?
Yes, the parameters and corresponding gradients are stored in the same shape and layout and thus can be subtracted from each other without any manipulations/transposes etc.
That makes so much sense!! Thank you so much!!!