Hello,
I am training an object detection model that has two losses, one of them tends to infinity but after normalization with the below commands it was fixed :
pi_minvals = pi[..., 4].min(3, keepdim=True).values
pj_minvals = pi[..., 4].min(3, keepdim=True).values
pi_maxvals = pi[..., 4].max(3, keepdim=True).values
pj_maxvals = pi[..., 4].max(3, keepdim=True).values
pi_norm = (pi[...,4]-pi_minvals)/(pi_maxvals - pi_minvals)
pj_norm = (pj[...,4] - pj_minvals) / (pj_maxvals - pj_minvals)
obji = self.BCEobj(pi_norm , pj_norm)
but still during first epoch get this error :
loss.backward()
File "/home/hoda/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/hoda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: Function 'CudnnConvolutionBackward' returned nan values in its 0th output.
'CudnnConvolutionBackward' returned nan values in its 0th output.
Do you have any idea how to solve it?
Thanks