This error is caused by the loss function shown below.
pos_predict is the data predicted by the model, pos_relaxed is the true value. The error is caused by this line
err += torch.sqrt(torch.sum(diff_pos[s] ** 2))
How can I rewrite it to a non-inplace operation form ?
def loss_func(model, batch, h):
pos_relaxed = batch["pos_relaxed"]
count_i_atom = batch["count_i_atom"]
cell = batch["cell"]
inv_cell = batch["inv_cell"]
# pos_predict is the data predicted by the model
pos_predict = model(h, batch)
n_obj = len(count_i_atom) - 1
frac_pos_a = torch.zeros_like(pos_relaxed)
frac_pos_b = torch.zeros_like(pos_predict)
for i in range(n_obj):
s = slice(count_i_atom[i], count_i_atom[i+1])
a_inv_cell = inv_cell[(i*3):(i*3+3), :]
frac_pos_a[s] = torch.matmul(pos_relaxed[s], a_inv_cell)
frac_pos_b[s] = torch.matmul(pos_predict[s], a_inv_cell)
diff_pos = frac_pos_a - frac_pos_b
flag = diff_pos > 0.5
diff_pos[flag] -= 1.0
flag = diff_pos < -0.5
diff_pos[flag] += 1.0
err = 0
for i in range(n_obj):
s = slice(count_i_atom[i], count_i_atom[i+1])
a_cell = cell[(i*3):(i*3+3), :]
diff_pos[s] = torch.matmul(diff_pos[s], a_cell)
err += torch.sqrt(torch.sum(diff_pos[s] ** 2))
return err / n_obj
The error is
/mnt/d/software_install/pytorch/lib/python3.12/site-packages/torch/autograd/__init__.py:266: UserWarning: Error detected in PowBackward0. Traceback of forward call that caused the error:
File "/mnt/e/workdir/ML_ADS/my_code/test_egnn.py", line 263, in <module>
loss = loss_func(model, batch, h)
File "/mnt/e/workdir/ML_ADS/my_code/test_egnn.py", line 210, in loss_func
err += torch.sqrt(torch.sum(diff_pos[s] ** 2))
File "/mnt/d/software_install/pytorch/lib/python3.12/site-packages/torch/_tensor.py", line 40, in wrapped
return f(*args, **kwargs)
(Triggered internally at /opt/conda/conda-bld/pytorch_1708025569485/work/torch/csrc/autograd/python_anomaly_mode.cpp:113.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/mnt/e/workdir/ML_ADS/my_code/test_egnn.py", line 265, in <module>
loss.backward()
File "/mnt/d/software_install/pytorch/lib/python3.12/site-packages/torch/_tensor.py", line 522, in backward
torch.autograd.backward(
File "/mnt/d/software_install/pytorch/lib/python3.12/site-packages/torch/autograd/__init__.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [86, 3]], which is output 0 of AsStridedBackward0, is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!