Hi, I’m training a model consisting of a nn.LSTM connected into a nn.Linear, using this for a regression problem. However training becomes real slow even inside the first epoch. I think my problem is that I’m using `retrain_graph=True`

and also that the graph of my hidden state grows at each iteration, as I refeed it to my model every iteration. If I try not using `retrain_graph=True`

it throws me an error “trying to backward through the graph a second time but the buffers have already been freed”. Since I’m new to RNNs I’m a bit confused, is it supposed to be like this, or am I missing something in my approach?

Here’s the relevant code:

def learn(X, y, hidden):

model.zero_grad()output, hidden = model(X, hidden)

loss = criterion(output, y)

optimizer.zero_grad()

loss.backward(retain_graph=True)

optimizer.step()return loss, output, hidden

hidden = None

for epoch in range(epochs):

loss_avg = 0

for i, (X,y) in enumerate(dataloader):`X = X.to(device) y = y.to(device) loss, output, hidden = learn(X,y, hidden) ...`

And the definition of my model follows:

class RNN(nn.Module):

definit(self, input_size, hidden_size, output_size = 1, num_layers = 1, batch_size = 1):

super(RNN, self).init()`self.hidden_size = hidden_size self.input_size = input_size self.batch_size = batch_size self.lstm = nn.LSTM(input_size, hidden_size, num_layers) self.mlp = nn.Linear(hidden_size, output_size) def forward(self, X, H=None): out, hidden = self.lstm(X,H) out = self.mlp(out[-1]) return out, hidden`

EDIT (Solved):

The problem was that I was non-intentionally training my hidden and saving all its graphs during training. I solved this by detaching the graph before returning, inside my `learn`

function.