Trouble with simple RNN

I’m trying to create my own toy example. I want a single RNN unit that takes a 1d input and has a 1d hidden state which gets updated.

I set the rnn block to be

rnn_b = nn.RNN(input_size=1, hidden_size=1, num_layers=1, bias=False)

Then my net is as follows:

class MyRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(MyRNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size

    def forward(self, input, hidden):
        out, hidden = rnn_b(input, hidden)  # update the hidden state
        return hidden

Next, I do

rnn = MyRNN(1, 1)

Now when I try to train, I do


criterion = nn.MSELoss()
learning_rate = 0.0005
hidden = torch.tensor([[[1.0]]])
def train(input, target):
    rnn.zero_grad()
    loss = 0
    global hidden
    hidden = rnn(input, hidden)
    l = criterion(hidden, target)
    loss += l
    loss.backward()
    for p in rnn.parameters():
        p.data.add_(-learning_rate, p.grad.data)
    return hidden, loss

I would like to train it to learn some sequence of numbers, say

inputs = [1, 2, 3, 4, ..., 1000]
targets = [2, 3, ..., 1000]
for i in range(0, 999):
    hidden, loss = train(torch.tensor([[[inputs[i]]]]), torch.tensor([[[targets[i+1]]]]))

This gives me the error: "Trying to backward through the graph a second time, but the buffers have already been freed. " I have seen many threads about this error, but I don’t quite understand it. Besides that, I get the feeling I am not doing things correctly here, so please give me some advice here :slight_smile:

After the first call to train, hidden becomes a tensor that has requires_grad=True and is a part of the computation graph.

During the second call to train, the backward step tries to propagate gradients through hidden. However, since we already did a backward(), the computation graph of hidden doesn’t exist anymore.

Perhaps you did not mean to propagate gradients through hidden during the second call to train? Something like the following:

hidden, loss = train(torch.tensor([[[inputs[i]]]]), torch.tensor([[[targets[i+1]]]]))
hidden = hidden.detach_()  # detach it from the computation graph for the next iteration of training