Forward propagation in my model seems to work fine, however as soon as I try to loss.backward() I get
File "PyTorch1.py", line 265, in <module>
loss.backward()
File "/home/riccardo/.anaconda3/envs/PyTorch/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/riccardo/.anaconda3/envs/PyTorch/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
I’m running pytorch 1.0.0 in a conda environment, torch.backends.cudnn.version() returns 7401, GeForce GTX 1080 with Cuda compilation tools release 9.1 , Driver Version: 410.48. I tried to replace my custom loss function with a simple torch.sum() and this doesn’t seem to change anything. This is the code I use to test one forward pass and gradient computation (on random numbers):
A=Unet(PARAMS).cuda()
start_time=time.time()
X=torch.randn(1,1,256,256,4)
FakeLabel=torch.randn(1,3,256,256,4)
FakeLabel[FakeLabel>0.5]=1
FakeLabel[FakeLabel<0.6]=0
FakeMask=torch.randn(1,1,256,256,4)
FakeMask[FakeMask>0.5]=1
FakeMask[FakeMask<0.6]=0
X=X.cuda()
optimizer=torch.optim.Adam(A.parameters())
optimizer.zero_grad()
#Mask, Label = A(X)
OUT=A(X)
WW=np.array([1.1,1.2,4])
loss = torch.sum(OUT)#MonoLoss(FakeMask,Mask,1,1) + CateLoss(FakeLabel,Label,1,WW)
loss=loss.cuda()
loss.backward()
optimizer.step()