Hi,
I am facing the same issue. Setting cudnn.benchmark=False
did not help (it was set to False
from the beginning). My code crashes after a second call to some function. (I use CUDA_LAUNCH_BLOCKING=1
to find out where the error occured). Any pointers to the cause and how to fix it? thanks
File "../libs/bn.py", line 109, in forward
self.training, self.momentum, self.eps, self.activation, self.slope)
File "../libs/functions.py", line 99, in forward
running_mean.mul_((1 - ctx.momentum)).add_(ctx.momentum * mean)
RuntimeError: CUDA error: an illegal memory access was encountered
When trying to print the value of the tensor running_mean
(during the second call), it raises the following error:
print(running_mean)
File "..../Venvs/pytorch.1.0.1/lib/python3.7/site-packages/torch/tensor.py", line 66, in __repr__
return torch._tensor_str._str(self)
File "..../Venvs/pytorch.1.0.1/lib/python3.7/site-packages/torch/_tensor_str.py", line 277, in _str
tensor_str = _tensor_str(self, indent)
File "..../Venvs/pytorch.1.0.1/lib/python3.7/site-packages/torch/_tensor_str.py", line 195, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "..../Venvs/pytorch.1.0.1/lib/python3.7/site-packages/torch/_tensor_str.py", line 84, in __init__
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
File "..../Venvs/pytorch.1.0.1/lib/python3.7/site-packages/torch/functional.py", line 271, in isfinite
return (tensor == tensor) & (tensor.abs() != inf)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generated/../THCTensorMathCompareT.cuh:69
—> running_mean
seems to have inf
values!!!
It seems an issue related to the machine where the code is running. (more specifically, cuda-related. Things run fine on cpu).
Fix and possible explanation.