Hey!
So I got pytorch 1.7.1 with cuda 11, rtx 3080 and ubuntu 20.04 (Installed from binaries).
I tried to train the model from this git: https://github.com/leftthomas/SRGAN
But I got the following error:
> Traceback (most recent call last):
> File "/home/work/projects/SRGAN/train.py", line 88, in <module>
> g_loss.backward()
> File "/home/yoad/anaconda3/envs/torch1.7.1/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
> torch.autograd.backward(self, gradient, retain_graph, create_graph)
> File "/home/yoad/anaconda3/envs/torch1.7.1/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward
> Variable._execution_engine.run_backward(
> RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 1024, 1, 1]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
So I ran the code with anomaly detections and this is the error:
0%| | 0/261 [00:00<?, ?it/s][W python_anomaly_mode.cpp:104] Warning: Error detected in CudnnConvolutionBackward. Traceback of forward call that caused the error:
File "/home/work/projects/SRGAN/train.py", line 80, in <module>
fake_out = netD(fake_img).mean()
File "/home/yoad/anaconda3/envs/torch1.7.1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/work/projects/SRGAN/model.py", line 84, in forward
return torch.sigmoid(self.net(x).view(batch_size))
File "/home/yoad/anaconda3/envs/torch1.7.1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yoad/anaconda3/envs/torch1.7.1/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/yoad/anaconda3/envs/torch1.7.1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yoad/anaconda3/envs/torch1.7.1/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/home/yoad/anaconda3/envs/torch1.7.1/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
return F.conv2d(input, weight, self.bias, self.stride,
(function _print_stack)
0%| | 0/261 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/work/projects/SRGAN/train.py", line 90, in <module>
g_loss.backward()
File "/home/yoad/anaconda3/envs/torch1.7.1/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/yoad/anaconda3/envs/torch1.7.1/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward
Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 1024, 1, 1]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Not sure where to look for the inplace operation, any Ideas?
thanks!