CUDNN_STATUS_INTERNAL_ERROR on CentOS?

Hi everyone,
we are trying to set up a deep learning environment on a workstation, but even the simplest of examples will fail with CUDNN_STATUS_INTERNAL_ERROR. I really don’t know what we could be doing wrong, so I am posting here hoping someone has experienced similar issues and can help us.

The system we are running this on has dual TianXP and the following software configuration:
CentOS 7.5.1804
Kernel: 3.10.0-862.2.3.el7.x86_64
Nvidia: 390.48
CUDA: 9.1.85
cuDNN: 7.1 (cudnn_version_7_1_3_16_c23872914_m0_e0)
pytorch: 0.4.0 (200fb22b22c5f1c5345d99743ef43764b9a8323c) (built from source, same happens when using pip installer)

The minimalistic example we are trying to run is this one:

import torch

print('cuda available: {}'.format(torch.cuda.is_available()))
print('cudnn available: {}'.format(torch.has_cudnn))

a = torch.rand((2, 3, 64, 64, 64))
from torch.autograd import Variable
a = Variable(a).cuda()
from torch import nn
b = nn.Conv3d(3, 12, 3, 1, 1).cuda()
c = b(a)

Which gives the following output:

cuda available: True
cudnn available: True
Traceback (most recent call last):
  File "./testtorch.py", line 15, in <module>
    c = b(a)
  File "/home/bernstei/virtualenv/torch-src/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bernstei/virtualenv/torch-src/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 421, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDNN_STATUS_INTERNAL_ERROR

Any help is greatly appreciated!
Kind regards,
Fabian