Masked_fill runtime error about inplace operation

cm8908 · July 27, 2022, 9:15am

Hi everyone. I am working on a transformer model that iteratively outputs probabilities of selecting node from N categories. I’d like to mask the probability of already selected node to zero so it doesn’t get selected any more. And I have encountered a runtime error at loss.backward() with message:

one of the variables needed for gradient computation has been modified by an inplace operation: [CPUBoolType [1, 512, 25]] is at version 25; expected version 24 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I could reproduce this error with following code:

import torch
from torch import nn
torch.autograd.set_detect_anomaly(True)
log_probs = []
layer = nn.Linear(28, 28)
linear = nn.Linear(28, 5)
h_t = torch.randn(1, 10, 28)
mask = torch.zeros(1, 10, 5).bool()
for i in range(5):
    h_t = layer(h_t)
    prob = linear(h_t)
    prob = prob.masked_fill(mask, 0)
    city = prob.argmax(-1)  # Cat?
    mask[:,torch.arange(10),city] = True
log_probs.append(prob.log())
L_diff = 10
slp = torch.cat(log_probs, dim=0).sum(dim=0)
loss = L_diff * slp
loss.mean().backward()

And I’m pretty sure this is because of the masked_fill operation according to the message by autograd anomaly detect:

/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/autograd/init.py:173: UserWarning: Error detected in MaskedFillBackward0. Traceback of forward call that caused the error:
File “/home/gailab/ms/tspxl/temp.py”, line 12, in
prob = prob.masked_fill(mask, 0)
(Triggered internally at /opt/conda/conda-bld/pytorch_1646755897462/work/torch/csrc/autograd/python_anomaly_mode.cpp:104.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File “/home/gailab/ms/tspxl/temp.py”, line 19, in
loss.mean().backward()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/_tensor.py”, line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/autograd/init.py”, line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass

How can I fix this error? I wonder if the masked_fill operation should never be used because I’ve seen many codes containing masked_fill operations working correctly. Any help would be appreciated.

ptrblck · July 28, 2022, 12:35am

The issue is raised in the masked_fill operation, but the error points to the inplace manipulation of mask in:

mask[:,torch.arange(10),city] = True

Assuming you don’t want to reuse the same mask you could recreate it:

    ...
    mask = torch.zeros(1, 10, 5).bool()
    mask[:,torch.arange(10),city] = True

and it should work.

cm8908 · July 28, 2022, 8:41am

It worked by cloning the mask. Thank you for your help a lot!