Rewriting in_place operation

AhmadMoussa · December 22, 2019, 4:42am

I’m writing a custom loss function, and am currently debugging with gradcheck(), and it throws this error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []], which is output 0 of SelectBackward, is at version 9; expected version 8 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I understand what the error is. Turning on the anomaly detection, it pointed me to this piece of code:

if XYZ[2] > 0.008856:
    XYZ[2] = XYZ[2] ** 1/3
else:
    XYZ[2] = (7.787 * value) + (16 / 116)

How would I correctly rewrite this to make it not in place but also not introduce an overhead? Since this is going to be part of my loss function I require it to be efficient if possible. Thanks for any comments!

aauker · December 22, 2019, 6:26am

If your re-write works, I’d just go with that. If you want this operation to be in-place, you’d need to also manually re-write the gradient graphs also as an in-place operation, which is a hard matter. On in-place use with the autograd framework:

Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases . Autograd’s aggressive buffer freeing and reuse makes it very efficient and there are very few occasions when in-place operations actually lower memory usage by any significant amount. Unless you’re operating under heavy memory pressure, you might never need to use them.

AhmadMoussa · December 23, 2019, 5:00am

Ok I see. Now I’m still not sure if I’m doing it correctly. When I do something like this:

temp = Variable(XYZ[0].data, requires_grad=True)
if XYZ[0] > 0.008856:
    XYZ[0] = temp ** 1/3
else:
    XYZ[0] = (7.787 * temp) + (16 / 116)

then I get another error:

RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[ 0.0297,  0.1332,  0.0469],
        [ 0.0084, -0.0171,  0.0123],
        [ 0.0104,  0.0730, -0.1986]], dtype=torch.float64)
analytical:tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]], dtype=torch.float64)

Whereas when I do this the gradcheck returns True:

temp = XYZ[0].clone()
if temp > 0.008856:
    XYZ[0] = temp ** 1 / 3
else:
    XYZ[0] = (7.787 * temp) + (16 / 116)

What exactly is the difference between the two? And is the second one correct?

aauker · December 25, 2019, 7:52pm

Ah, I see. When you write something like XYZ[0] = ... , this is still an in-place operation.

My recommendation to follow a scheme of conditionally changing values in your loss function (I assume from a network output of XYZ) is to initialize new tensors for each index:

Here’s an example for a tensor XYZ of shape [3]:

XYZ = nn.Parameter(torch.tensor([.008,.009,1.0]))

if XYZ[0] > 0.008856:
    at_0 = XYZ[0] ** 1/3
else:
    at_0 = (7.787 * XYZ[0]) + (16 / 116)

if XYZ[1] > 0.008856:
    at_1 = XYZ[1] ** 1/3
else:
    at_1 = (7.787 * XYZ[1]) + (16 / 116)
    
at_2 = XYZ[2]

test = torch.stack ([at_0, at_1, at_2])
test.sum().backward()

The tensor slicing will be kept track of automatically by autograd too:

print (XYZ[1])
tensor(0.0090, grad_fn=<SelectBackward>)