Tensor.gather triggers in place operation error when calling backward

M_Tu · October 16, 2019, 9:59pm

Pytorch 1.2 on linux with cuda 10.0.

Got the tracing info by applying torch.autograd.set_detect_anomaly(True).

The error shows the

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [2, 1, 1024]] is at version 1; expected version 0 instead. Hint: the backtrace further above shows       the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

and it points to

states = hidden_states.gather(-2, positions)

Any ideas? Thanks.

albanD · October 17, 2019, 3:04pm

Hi,

For inplace errors, the anomaly mode points to the operation that saved the Tensor for later use. Not the one that modified it inplace.

But given that it conplains that a LongTensor is faulty. I’m fairly confident in the fact that you change positions inplace later in your code That would be causing the issue as gather() needs these positions to be able to compute its backward.

M_Tu · October 17, 2019, 6:23pm

Thanks a lot for your reply Alban.

I checked the position variable, I only changed it in one place within the forward() func:

positions = positions.unsqueeze(-1).expand(-1, -1, fdim)

and the same error still exists. If I don’t use positions as an input argument, actually this error disappears.

Also, I’m wondering what’s version of a variable means. I got different version numbers if I change some of the operations on positions.

Thanks again.

albanD · October 17, 2019, 6:28pm

Versions tracks how many times it has been changed inplace.
Inplace modifications are anything like positions[foo] = or positions.add_(foo) or positions += 2.
a simple fix you can use is replace your gather call to:

states = hidden_states.gather(-2, positions.clone())

The clone here will make sure that you don’t share memory between the version used by gather and the one used in the rest of your code. That way inplace operations in the rest of your code won’t be a problem for gather.

M_Tu · October 17, 2019, 9:46pm

Thanks Alban. I used clone() and it works now. Wondering if this will affect the backward behavior.

albanD · October 17, 2019, 9:48pm

No, the gradients will flow through the clone op and will be computed properly