RuntimeError: Function 'DivBackward0' returned nan values in its 1th output

duyducvo4444 · May 22, 2022, 3:08pm

Dear all,
I know this problem has been asked a million times already. But right now I’m facing one and really need your help.
I have a snippet as follow:

import torch

torch.autograd.set_detect_anomaly(True)
temp = torch.load('temp.pt')
clean = temp['clean'].clone().detach().requires_grad_(True)
noise = temp['noise'].clone().detach().requires_grad_(True)

sm_mask = torch.exp(clean) / (torch.exp(clean) + torch.exp(noise) + 1e-12)
sq_mask = torch.sqrt(sm_mask)

sq_mask.mean().backward() <----- error here
print('')

for the clean matrix, it’s max value is 0.004, it’s min is 0
for the noise matrix, it’s max value is 14, it’s min is 0.09

That means there is neither nan nor inf in both of them, and sm_mask will have no negative entries either, since their minimums is positive.
Can anyone please give me an idea why torch.sqrt(sm_mask) causes the error?

PS: If I run

sq_mask = torch.sqrt(sm_mask + 1e-12)  # instead of torchsqrt(sm_mask)

Then there is no error, why is that?

PS2: I’m sorry, here is the stacktrace:

[W ..\torch\csrc\autograd\python_anomaly_mode.cpp:104] Warning: Error detected in DivBackward0. Traceback of forward call that caused the error:
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\pydevd.py", line 2173, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\pydevd.py", line 2164, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1476, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1483, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/Master/JAIST/Proposal/Code/Master3/Test.py", line 8, in <module>
    sm_mask = torch.exp(clean) / (torch.exp(clean) + torch.exp(noise) + 1e-12)
 (function _print_stack)
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1483, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/Master/JAIST/Proposal/Code/Master3/Test.py", line 13, in <module>
    sq_mask.mean().backward()
  File "C:\Users\duyvo\anaconda3\envs\Code\lib\site-packages\torch\_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\duyvo\anaconda3\envs\Code\lib\site-packages\torch\autograd\__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Function 'DivBackward0' returned nan values in its 1th output.
python-BaseException

Process finished with exit code 1

AlphaBetaGamma96 · May 22, 2022, 5:02pm

Can you check your loss values is ok before calling .backward()? Also, the derivative of sqrt(x) is 0.5 * x**(-0.5) so check your input is positive too

duyducvo4444 · May 22, 2022, 5:18pm

Dear @AlphaBetaGamma96 ,
Thank you very much for the reply.

Yes, the input x to the sqrt(x) is ok, but it have the minimum value being zero.
I thought it was ok since sqrt(0) = 0. Now that you mention the derivative of sqrt(x). I think x have zeros in it is the issue here. Am I correct?

AlphaBetaGamma96 · May 22, 2022, 5:20pm

Most likely as 0^(-0.5) is undefined, so it be should strictly positive which is most likely why your value works if you include a small offset value

duyducvo4444 · May 22, 2022, 5:24pm

Yes, I think so too. But what bother me is the stacktrace. Why is it the function ‘DivBackward’ causing error but not the ‘SqrtBackward’?
And also, when I run this snippet

te = torch.zeros(4, requires_grad=True)
sq_mask = torch.sqrt(te)

sq_mask.mean().backward()

Then it is ok?
By adding the small offset, the model is running fine but I’m still bothered because this solution doesn’t make sense with the error whatsoever.

AlphaBetaGamma96 · May 22, 2022, 5:25pm

because the sqrt of zero is 0, but by dividing 0 is undefined that’s why the stacktrace has an issue with that command.

duyducvo4444 · May 22, 2022, 5:42pm

Yeah, I think that would clear my confusion for now.
The actual DivBackward culprit was 1/(2*sqrt(x)). I thought

sm_mask = torch.exp(clean) / (torch.exp(clean) + torch.exp(noise) + 1e-12)

this division was causing it.

Anyways, thank you very much.