Allocating with pin_memory() fails on sm_80 device

Hello,
I have some code that has worked for a long time, that I am now running on a new A100 device, and is failing. I am using pytorch 1.7.1 and cudatoolkit11.0.

Any help is appreciated. Thanks!

(Note that a C++/CUDA program that uses cudaHostAlloc() works just fine on the A100 device)

relevant lines from conda list are:

cudatoolkit               11.0.221             h6bb024c_0
pytorch                   1.7.1           py3.7_cuda11.0.221_cudnn8.0.5_0    pytorch

Here is a MWE of the failing line on the A100 device:

Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.get_device_capability()
(8, 0)
>>> torch.cuda.get_device_properties(0)
_CudaDeviceProperties(name='A100-SXM4-40GB', major=8, minor=0, total_memory=40537MB, multi_processor_count=108)
>>> torch.cuda.get_arch_list()
['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'compute_37']
>>> cpu = torch.empty(1024, 3, 224, 224, dtype=torch.float32)
>>> cpup = torch.empty(1024, 3, 224, 224, dtype=torch.float32).pin_memory()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: cuda runtime error (1) : invalid argument at /opt/conda/conda-bld/pytorch_1607370156314/work/aten/src/THC/THCCachingHostAllocator.cpp:278

Here is same on a Quadro RTX8000 device:


Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.get_device_properties(0)
_CudaDeviceProperties(name='Quadro RTX 8000', major=7, minor=5, total_memory=48601MB, multi_processor_count=72)
>>> torch.cuda.get_device_capability(0)
(7, 5)
>>> torch.cuda.get_arch_list()
['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'compute_37']
>>> cpu = torch.empty(1024, 3, 224, 224, dtype=torch.float32)
>>> cpup = torch.empty(1024, 3, 224, 224, dtype=torch.float32).pin_memory()
>>>  [ no error ]

nevermind. Seem like there was a mis-configuration of the submission file on the server running the jobs that was the source of this error.