PyTorch detect GPU but can't run model on GPU - A6000

werdas34 · August 11, 2022, 8:37pm

I am working with a paperspace vm with an A6000 GPU and have a transformers/pytorch model there.
I can’t get this to run on this GPU but it runs on the CPU.
PyTorch must have version 1.4.
The following is the setup of the VM and sample code.
Interestingly, the same model with the same settings (except nvcc, because no toolkit) runs fine on a P9000 from paperspace.

Anybody have an idea how I can get the model to run on the A6000 anyway?

import transformers, torch
print(transformers.__version__)
print(torch.__version__)
3.2.0
1.4.0+cu92

torch.cuda.is_available()
True

torch.cuda.device_count()
1

torch.cuda.current_device()
0

torch.cuda.device(0)
<torch.cuda.device at 0x7f679aac13d0>

torch.cuda.get_device_name(0)
'NVIDIA RTX A6000'

torch.version.cuda
'9.2'

nvidia-smi
Thu Aug 11 19:21:54 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A6000    On   | 00000000:00:05.0 Off |                  Off |
| 30%   35C    P8    18W / 300W |      5MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1425      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal

x = torch.rand(5, 3, device="cpu")
print(x)
tensor([[0.8225, 0.6572, 0.9615],
        [0.5285, 0.8547, 0.3622],
        [0.8873, 0.2297, 0.5951],
        [0.1628, 0.1683, 0.3286],
        [0.5517, 0.0598, 0.7807]])

x = torch.rand(5, 3, device="cuda:0")
print(x)
# infinity-loop, does not terminate, no error message

ptrblck · August 11, 2022, 8:38pm

That won’t work on your A6000, as you need CUDA=11.x for your Ampere GPU and thus cannot use the CUDA 9.2 runtime.

werdas34 · August 11, 2022, 8:49pm

Thank you for the quick reply. That explains a lot, of course.

I need a GPU with more than 24GB GPU RAM. Do you have any idea where I can find a cloud solution?

Paperspace also offers a V100. Would this work? I’m guessing A100 won’t work either.

ptrblck · August 11, 2022, 8:55pm

CUDA 9.2 would support compute capability 7.0, so Volta should work.
A100 will not work with CUDA 9.2 as it’s also an Ampere GPU (sm_80) and thus needs CUDA 11.

In any case, your best bet would be to update to the latest PyTorch release and use a current CUDA runtime.

werdas34 · August 11, 2022, 9:02pm

Thanks for the answer.

I would like to update to the latest version. I am working on a model that was released last year with PyTorch 1.4. Updating it dry and dirty did not work.
And I am not an expert in PyTorch.

Edit:
Which pytorch version would I need to have at least for the A6000?

ptrblck · August 11, 2022, 10:53pm

PyTorch 1.7.0 should have introduced CUDA 11, but note that this version is already quite old as it was released in late Oct 2020 so I would still recommend to try to update your repository to be compatible with the latest stable release.