How to make sense of Pytorch linear layer shape?

Archit_Srivastava · June 10, 2022, 1:37am

I am training FFNN for MNIST with a batch size of 32. My first linear layer has 100 neurons, defined as nn.linear(784,100). When I check the shape of the layer using model[0].weight.shape, I get [100,784]. My input is of the shape [32,784]. It was my understanding that there are matrix multiplication Weights with the input, however, I cannot see how to do that between the weight tensor of shape [100,784] and input tensor of shape [32,784]. Shouldn’t the weight tensor be [784,100]? Or does PyTorch transpose the tensor before multiplication?

Thanks!!

ptrblck · June 10, 2022, 1:42am

Yes, the weight is transposed as seen e.g. here.

Archit_Srivastava · June 10, 2022, 3:49am

Thanks!!
I have a follow-up question
The gradients of the parameters of the weights of for the above-mentioned layer
model[0].weight.grad, after J.backward() has been called is of the shape [100,784].
The shape of model[0].weight.grad is the transpose of the shape of the parameters of the layer that is used during forward propagation- model[0].weight
I am using SGD optimizer with just the learning rate hyperparameter to update the parameters. Am I correct to assume that the weights for the layer are updated as follows?
model[0].weight+=(learning_rate)model[0].weight.grad.
And this works out because PyTorch stores model[0].weight in the same shape as model[0].weight.grad?

ptrblck · June 10, 2022, 5:53pm

Yes, the parameters and corresponding gradients are stored in the same shape and layout and thus can be subtracted from each other without any manipulations/transposes etc.

Archit_Srivastava · June 10, 2022, 11:58pm

That makes so much sense!! Thank you so much!!!