Tensor.gather triggers in place operation error when calling backward

Pytorch 1.2 on linux with cuda 10.0.

Got the tracing info by applying torch.autograd.set_detect_anomaly(True).

The error shows the

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [2, 1, 1024]] is at version 1; expected version 0 instead. Hint: the backtrace further above shows       the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

and it points to

states = hidden_states.gather(-2, positions)

Any ideas? Thanks.


For inplace errors, the anomaly mode points to the operation that saved the Tensor for later use. Not the one that modified it inplace.

But given that it conplains that a LongTensor is faulty. I’m fairly confident in the fact that you change positions inplace later in your code :slight_smile: That would be causing the issue as gather() needs these positions to be able to compute its backward.

Thanks a lot for your reply Alban.

I checked the position variable, I only changed it in one place within the forward() func:

positions = positions.unsqueeze(-1).expand(-1, -1, fdim)

and the same error still exists. If I don’t use positions as an input argument, actually this error disappears.

Also, I’m wondering what’s version of a variable means. I got different version numbers if I change some of the operations on positions.

Thanks again.

Versions tracks how many times it has been changed inplace.
Inplace modifications are anything like positions[foo] = or positions.add_(foo) or positions += 2.
a simple fix you can use is replace your gather call to:

states = hidden_states.gather(-2, positions.clone())

The clone here will make sure that you don’t share memory between the version used by gather and the one used in the rest of your code. That way inplace operations in the rest of your code won’t be a problem for gather.

1 Like

Thanks Alban. I used clone() and it works now. Wondering if this will affect the backward behavior.

No, the gradients will flow through the clone op and will be computed properly :slight_smile:

1 Like