An error on loss.backward() -> RuntimeError: could not create a primitive

Hi,
I am training a joint segmentation-classification model and the loss is the weighted average of the losses.

                    loss = w_seg * loss_seg + w_cls * loss_cls
                    model.zero_grad()
                    print(loss)
                    loss.backward()

loss is created and printed but I have an error on loss.backward():

tensor(0.6947, grad_fn=< AddBackward0 >)
Traceback (most recent call last):
File “main.py”, line 140, in
main(config)
File “main.py”, line 80, in main
solver.train()
File “/project/6027897/meyta/Codes/Joint/4-11/Compile_nets.py”, line 346, in train
loss.backward()
File “/home/meyta/.local/lib/python3.8/site-packages/torch/tensor.py”, line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “/home/meyta/.local/lib/python3.8/site-packages/torch/autograd/init.py”, line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: could not create a primitive

Does anyone has an idea about the error?

This error was recently reported in this thread. As I couldn’t reproduce it, I’ve asked the user to create an issue on GitHub, but cannot find any issue with this error message, so could you create an issue instead with an executable code snippet (if possible), please?

Thank you for your reply. The code works properly in some versions, and in some others the error changes to:

  File "main.py", line 140, in <module>
    main(config)
  File "main.py", line 80, in main
    solver.train()
  File "/project/6027897/meyta/Joint/4-11/Compile_nets.py", line 344, in train
    loss.backward()
  File "/home/meyta/ENV/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/meyta/ENV/lib/python3.6/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: label is too far

I tried with a few systems and versions, but I couldn’t find out what was the exact source of the error. So, reproducing errors is not straightforward for me. Now, the code is working on my system but I still face the error on the server (computecanada).

That’s interesting, as I cannot find any references for this error message. Were you able to find any similar issue?

File “run.py”, line 83, in
learn.fit_one_cycle(epochs=30,max_lr=1e-3)
File “C:\Users\v.huseynov\PycharmProjects\dim\learn.py”, line 85, in fit_one_cycle
self._fit(epochs=epochs, cyclic=True,)
File “C:\Users\v.huseynov\PycharmProjects\dim\learn.py”, line 137, in fit
loss.backward()
File “C:\ProgramData\Anaconda3\envs\dim\lib\site-packages\torch_tensor.py”, line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\ProgramData\Anaconda3\envs\dim\lib\site-packages\torch\autograd_init
.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: could not create a primitive

HI , i got this error running model on cpu device. When i track running in colab track that in starting second epoch ram is dramatically increase. maybe problem in ram???

I don’t know, as I wasn’t able to reproduce or isolate another error with the same error message.
Based on this post I could find some references to potentially missing CPU instructions, but don’t know if that was indeed the root cause.

I can fix this error removing tensorboard wirter logging from code part. seems tensorboard use this part of ram