Hi,

I’m trying to understand LSTMs and backpropagation with LSTMs. I have a simple architecture.

```
def __init__(self, input_dim=100, hidden_dim=100, output_dim=216, num_layers=2):
super(LSTMGenerator, self).__init__()
self.input_dim = input_dim
self.hidden_dim = hidden_dim
self.output_dim = output_dim
self.num_layers = num_layers
self.layer1 = nn.LSTM(self.input_dim, self.hidden_dim, self.num_layers, bidirectional=False, dropout=0.5)
self.out = nn.Linear(self.hidden_dim, self.output_dim)
```

I’m using the pytorch register_backward_hook function. I’m trying to modify the gradient in the backward pass. Should I use grad_input or grad_output? Which gradient is being used to update the previous weights?

I printed out the gradient shapes. What do the values in grad_input and grad_output mean?

length of grad input 3

grad input 0 shape torch.Size([216])

grad input 1 shape torch.Size([128, 100])

grad input 2 shape torch.Size([100, 216])

length of grad output 1

grad ouput 0 shape torch.Size([128, 216])

Any help would be appreciated.