Hi everyone. I am working on a transformer model that iteratively outputs probabilities of selecting node from N categories. I’d like to mask the probability of already selected node to zero so it doesn’t get selected any more. And I have encountered a runtime error at loss.backward() with message:
one of the variables needed for gradient computation has been modified by an inplace operation: [CPUBoolType [1, 512, 25]] is at version 25; expected version 24 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I could reproduce this error with following code:
import torch
from torch import nn
torch.autograd.set_detect_anomaly(True)
log_probs = []
layer = nn.Linear(28, 28)
linear = nn.Linear(28, 5)
h_t = torch.randn(1, 10, 28)
mask = torch.zeros(1, 10, 5).bool()
for i in range(5):
h_t = layer(h_t)
prob = linear(h_t)
prob = prob.masked_fill(mask, 0)
city = prob.argmax(-1) # Cat?
mask[:,torch.arange(10),city] = True
log_probs.append(prob.log())
L_diff = 10
slp = torch.cat(log_probs, dim=0).sum(dim=0)
loss = L_diff * slp
loss.mean().backward()
And I’m pretty sure this is because of the masked_fill
operation according to the message by autograd anomaly detect:
/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/autograd/init.py:173: UserWarning: Error detected in MaskedFillBackward0. Traceback of forward call that caused the error:
File “/home/gailab/ms/tspxl/temp.py”, line 12, in
prob = prob.masked_fill(mask, 0)
(Triggered internally at /opt/conda/conda-bld/pytorch_1646755897462/work/torch/csrc/autograd/python_anomaly_mode.cpp:104.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File “/home/gailab/ms/tspxl/temp.py”, line 19, in
loss.mean().backward()
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/_tensor.py”, line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/autograd/init.py”, line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
How can I fix this error? I wonder if the masked_fill
operation should never be used because I’ve seen many codes containing masked_fill
operations working correctly. Any help would be appreciated.