Hello,
I have an experiment I’m developing for some time now, and on my old GPU server it works just fine (Titan X card). Recently I’ve started using a new workstation, with a Geforce RTX 208 and when I run my code I get the following error:
outputs = model(input_vars)
File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 148, in forward
self.return_indices)
File "/home/user/anaconda3/lib/python3.7/site-packages/torch/_jit_internal.py", line 132, in fn
return if_false(*args, **kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 425, in _max_pool2d
input, kernel_size, stride, padding, dilation, ceil_mode)[0]
File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 417, in max_pool2d_with_indices
return torch._C._nn.max_pool2d_with_indices(input, kernel_size, _stride, padding, dilation, ceil_mode)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu:120
I pasted here the relevant parts of the stack trace.
this is my nvidia-smi output:
| NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 On | N/A |
| 0% 43C P8 13W / 260W | 139MiB / 10986MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1332 G /usr/lib/xorg/Xorg 59MiB |
| 0 1377 G /usr/bin/gnome-shell 78MiB |
+-----------------------------------------------------------------------------+
I’m pretty baffled by this, I have no idea where to start solving it. Any suggestions would be greatly appreciated. Should mention that on this machine CUDA is running in the following docker environment:
docker run --runtime=nvidia nvidia/cuda:10.0-base