LSTM gradient shape


I’m trying to understand LSTMs and backpropagation with LSTMs. I have a simple architecture.

def __init__(self, input_dim=100, hidden_dim=100, output_dim=216, num_layers=2):
    super(LSTMGenerator, self).__init__()
    self.input_dim = input_dim
    self.hidden_dim = hidden_dim
    self.output_dim = output_dim
    self.num_layers = num_layers
    self.layer1 = nn.LSTM(self.input_dim, self.hidden_dim, self.num_layers, bidirectional=False, dropout=0.5)
    self.out = nn.Linear(self.hidden_dim, self.output_dim)

I’m using the pytorch register_backward_hook function. I’m trying to modify the gradient in the backward pass. Should I use grad_input or grad_output? Which gradient is being used to update the previous weights?

I printed out the gradient shapes. What do the values in grad_input and grad_output mean?

length of grad input 3
grad input 0 shape torch.Size([216])
grad input 1 shape torch.Size([128, 100])
grad input 2 shape torch.Size([100, 216])

length of grad output 1
grad ouput 0 shape torch.Size([128, 216])

Any help would be appreciated.