How to make sense of Pytorch linear layer shape?

I am training FFNN for MNIST with a batch size of 32. My first linear layer has 100 neurons, defined as nn.linear(784,100). When I check the shape of the layer using model[0].weight.shape, I get [100,784]. My input is of the shape [32,784]. It was my understanding that there are matrix multiplication Weights with the input, however, I cannot see how to do that between the weight tensor of shape [100,784] and input tensor of shape [32,784]. Shouldn’t the weight tensor be [784,100]? Or does PyTorch transpose the tensor before multiplication?


Yes, the weight is transposed as seen e.g. here.

1 Like

I have a follow-up question
The gradients of the parameters of the weights of for the above-mentioned layer
model[0].weight.grad, after J.backward() has been called is of the shape [100,784].
The shape of model[0].weight.grad is the transpose of the shape of the parameters of the layer that is used during forward propagation- model[0].weight
I am using SGD optimizer with just the learning rate hyperparameter to update the parameters. Am I correct to assume that the weights for the layer are updated as follows?
And this works out because PyTorch stores model[0].weight in the same shape as model[0].weight.grad?

Yes, the parameters and corresponding gradients are stored in the same shape and layout and thus can be subtracted from each other without any manipulations/transposes etc.

1 Like

That makes so much sense!! Thank you so much!!!