Hello, I would like to ask about how model.weight is ordered. The reason why I’m asking is because when I print out the weight and gradient, it seems like one neuron does not get updated. Is the last row corresponds to the last neuron? Or is the order reversed, which the last row corresponds to the first neuron of the decoder.
Note that my decoder layer is in_dim * 30