Cublas runtime error : the GPU program failed to execute... with CUDA 10 and torch 1.2.0

Hi,

I am using pytorch 1.2.0 with CUDA 10.0.130 to train a neural network on a GeForce 2080 Ti GPU. I am receiving a cublas runtime error (traceback pasted at the bottom). My code runs successfully with pytorch 1.1, CUDA 9, and GeForce GTX 1080, so the problem probably relates to my environment configuration. I thought PyTorch 1.2.0, CUDA 10, and the 2080 Tis were all compatible? More specifics about my environment are pasted below the error traceback. I appreciate any help resolving this issue – thank you!

Error traceback

CUDA runtime error: misaligned address (74) in magma_queue_destroy_internal at /opt/conda/conda-bld/magma-cuda100_1549065924616/work/interface_cuda/interface.cpp:944
CUDA runtime error: misaligned address (74) in magma_queue_destroy_internal at /opt/conda/conda-bld/magma-cuda100_1549065924616/work/interface_cuda/interface.cpp:945
CUDA runtime error: misaligned address (74) in magma_queue_destroy_internal at /opt/conda/conda-bld/magma-cuda100_1549065924616/work/interface_cuda/interface.cpp:946
Traceback (most recent call last):
File “/dscrhome/snb21/autoencoded-vocal-analysis-master/mouse_sylls_mwe.py”, line 150, in
model.train_loop(loaders, epochs=600, test_freq=None)
File “/hpchome/mooneylab/snb21/autoencoded-vocal-analysis-master/ava/models/vae.py”, line 422, in train_loop
loss = self.train_epoch(loaders[‘train’])
File “/hpchome/mooneylab/snb21/autoencoded-vocal-analysis-master/ava/models/vae.py”, line 354, in train_epoch
loss.backward()
File “/dscrhome/snb21/.conda/envs/ava2/lib/python3.7/site-packages/torch/tensor.py”, line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “/dscrhome/snb21/.conda/envs/ava2/lib/python3.7/site-packages/torch/autograd/init.py”, line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/THCBlas.cu:331

Environment Details
PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Red Hat Enterprise Linux Server release 7.7 (Maipo)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
CMake version: version 2.8.12.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce RTX 2080 Ti
Nvidia driver version: 418.39
cuDNN version: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.5.0

Versions of relevant libraries:
[pip] numpy==1.16.4
[pip] torch==1.2.0
[pip] torchvision==0.4.0a0+6b959ee
[conda] blas 1.0 mkl
[conda] magma-cuda100 2.5.1 1 pytorch
[conda] mkl 2019.4 243
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.14 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] pytorch 1.2.0 py3.7_cuda10.0.130_cudnn7.6.2_0 pytorch
[conda] torchvision 0.4.0 py37_cu100 pytorch

Could you post a (minimal) code snippet to reproduce this issue?