Also, I get this error for the second iteration of the for loop (which is over the minibatches).
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor []] is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Check if you are reusing some outputs or activations from the previous iteration without detaching them from the old computation graph, as this would also cause the issue as seen in this small example:
model = nn.Linear(10, 10)
data = torch.randn(1, 10)
target = torch.randn(1, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=1.)
criterion = nn.MSELoss()
# iter0 - works
optimizer.zero_grad()
out = model(data)
loss = criterion(out, target)
loss.backward()
optimizer.step()
# iter1 - works for a new forward pass and thus a new computation graph
optimizer.zero_grad()
out = model(data)
loss = criterion(out, target)
loss.backward()
optimizer.step()
# iter2 - fails since you are trying to backpropagate through iter2 and iter1
optimizer.zero_grad()
out = model(out) # !!!
loss = criterion(out, target)
loss.backward()
# RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.