The weight (W
) matrices for nn.Linear
layer are stored as W.T
. That is if we have 100 neurons in the input layer and 200 neurons in the next layer, our layer definition would be nn.Linear(in_features=100, out_features=200)
and I would expect the weight matrix to be of shape (100,200)
since for every neuron in the first layer we have 200 connections in the second layer and th e weights are “propagted” in this direction. However the weights are transposed and then stored for efficiency during backprop.
Is this behavior only restricted for nn.Linear
layers or is it implemented in all nn
modules. I specifically want to know if the internal weight matrices are transposed for an RNN
layer. I can see that the weight_ih
that is input to hidden matrix is transposed while storing but I cannot be sure about the weight_hh
since it’s a square matrix. I need to know since I am updating weight manually for each connection and transposed matrices might imply that I am updating the wrong connections. Basically I want to know “which neuron led the neruon in subsequent layer to fire”.