Hello,
I am trying to implement a Neural ODE in pytorch and I am getting the following error. I am new to Pytorch and I am not able to understand where the error is coming from.
The error is :
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [100, 3]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
This is my code snippet below.
class NeuralODE(nn.Module):
def init(self,x1,x2):
super(NeuralODE,self).init()
self.x1 = x1
self.x2 = x2
l = 100
self.f = nn.Sequential(nn.Linear(3,l),nn.Tanh(),nn.Linear(l,1))
def forward(self,t,x):
ifunc = self.x1
t_numpy = np.array([t.item()]) if t.dim() == 0 else t.detach().numpy()
i = torch.DoubleTensor([ifunc(t_numpy)]).reshape(-1,1)
x2 = (self.x2).clone().reshape(-1,1)
#dy/dt
C = 5
dy_dt = (-1/C)*i
S = x[0,:].clone().reshape(-1,1)
T = x[1,:].clone().reshape(-1,1)
i_S_T = torch.cat((i,S,T),dim=1)
# dp/dt
C1 = 0.0015397895
C 2 = 0.020306583
Q = self.f(i_S_T) # Finding Q through the neural network
dp_dt = (-C1*(T-x1))+(C2*Q)
return torch.cat((dy_dt,dp_dt),dim=0)
The reported shape of the tensor, [100, 3], can be a useful hint – see
below.
For some explanation about how inplace-modification errors occur and
some techniques to debug them, see this post:
You can sometimes “automatically” fix such errors by using pytorch’s sweep-inplace-modification-errors-under-the-rug context manager, but
it’s probably good practice to track down and understand the cause of
the error, even if you do use this solution.
It’s quite possible that the shape-[100, 3] tensor reported in the error
message is self.f[0].weight (that is, Linear (3. l).weight). You
are probably training your NeuralODE and you should be aware that calling optimizer.step() on your model will modify that Linear (3. l).weight
inplace. Are you using .backward (retain_graph = True) anywhere?
I am using loss.backward(retain_graph = True) to backpropogate the loss. When I try to implement just loss.backward() it shows the following error:
Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
I don’t understand how to work around this. Any help would be much appreciated.
I printed out the versions of the weights and biases of the neural network. It changes from 1 to 2 in the next epoch. So as my error indicates the expected version is 1
I modified the neural network in my code like below. I added clone to the weights. But it did not change the outcome. It is still showing the same error.
Just to be clear, to me “epoch” means that you have iterated over the
entire training set once, but that the training set consists of many batches.
Typically, you would perform one optimization step for each batch:
input, labels = get_batch (...) # one batch, not whole training set
output = model (input) # forward pass
loss = loss_fn (output, labels)
opt.zero_grad()
loss.backward() # backward pass
opt.step() # optimization step
If your error truly shows up only after an entire epoch, rather than occurring
after just a single batch, then you are likely doing something incorrect at the
end (or beginning) of the loop over batches that makes up an epoch.
(A common approach is to compute performance metrics for your validation
set and maybe also for your training set after each epoch. It would likely be
an error if any such per-epoch computations contained a loss.backward()
or opt.step() and something like that could be causing your issue.)
Moving on from the question of whether your error happens within your
loop over batches or only once per epoch:
Do you have more than one loss.backward() (or similar) line of code?
If so, why?
Do you have more than one opt.step() (or similar) line of code, and if
so, why?
In your original post, self.f is a Sequential. But in the code you post
below, you have self.lin1 and self.lin2. Let me use lin1 and lin2
to be concrete:
Please add (where model refers to the instance of NeuralODE that you
are training):
print ('model.lin1.weight._version:', 'model.lin1.weight._version)
print ('model.lin2.weight._version:', 'model.lin2.weight._version)
output = model (input) # or whatever
print ('model.lin1.weight._version:', 'model.lin1.weight._version)
print ('model.lin2.weight._version:', 'model.lin2.weight._version)
Please post the exact code fragments where you make these calls and
please post the exact output you get from the print statements.
Please also run with a with torch.autograd.detect_anomaly(): context
manager and post the full inplace-modification error message, including the
forward-call Traceback that anomaly detection gives you.
If it fits within, say, ten or twenty lines, please post your exact code where
you execute the equivalent of:
output = model (input)
loss = loss_fn (output, labels)
opt.zero_grad()
loss.backward()
opt.step()
Also, just to double-check, please print out model.lin1.shape and model.lin2.shape at some point after you instantiate model.
This would be a correct way to fix certain specific inplace-modification
errors, but maybe your error has a somewhat different cause.
As an aside, please use three backticks, ```, to correctly format your code
and output text.