Very slow/stuck(?) torch.matmul operation using NVIDIA RTX ADA 6000

Mona_Jalal · October 24, 2023, 12:56pm

So I have this test script in order to debug another problem I have with torch.matmul. Please note I need to use this specific version of PyTorch and torchvision due to reproducibility reason of HybridPose framework.

test.py is:

import torch

a = torch.rand(2, 3, device='cuda')
b = torch.rand(3, 2, device='cuda')

try:
  c = torch.matmul(a, b) 
except RuntimeError as e:
  print(e)

After running, it just keeps showing as stuck:

Here’s the environment setup:

$ conda list
# packages in environment at /home/mona/anaconda3/envs/hp:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
_pytorch_select           0.2                       gpu_0  
blas                      1.0                         mkl  
ca-certificates           2023.08.22           h06a4308_0  
certifi                   2022.12.7        py37h06a4308_0  
cffi                      1.15.0           py37h7f8727e_0  
cudatoolkit               10.0.130                      0  
cudnn                     7.6.5                cuda10.0_0  
freetype                  2.12.1               h4a9f257_0  
giflib                    5.2.1                h5eee18b_3  
intel-openmp              2022.1.0          h9e868ea_3769  
joblib                    1.3.2                    pypi_0    pypi
jpeg                      9e                   h5eee18b_1  
lcms2                     2.12                 h3be6417_0  
lerc                      3.0                  h295c915_0  
libdeflate                1.17                 h5eee18b_1  
libedit                   3.1.20221030         h5eee18b_0  
libffi                    3.2.1             hf484d3e_1007  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libpng                    1.6.39               h5eee18b_0  
libstdcxx-ng              11.2.0               h1234567_1  
libtiff                   4.5.1                h6a678d5_0  
libwebp                   1.2.4                h11a3e52_1  
libwebp-base              1.2.4                h5eee18b_1  
lz4-c                     1.9.4                h6a678d5_0  
mkl                       2020.2                      256  
mkl-service               2.3.0            py37he8ac12f_0  
mkl_fft                   1.3.0            py37h54f3939_0  
mkl_random                1.1.1            py37h0573a6f_0  
ncurses                   6.4                  h6a678d5_0  
ninja                     1.10.2               h06a4308_5  
ninja-base                1.10.2               hd09550d_5  
numpy                     1.19.2           py37h54aff64_0  
numpy-base                1.19.2           py37hfa32c7d_0  
opencv-python             4.8.1.78                 pypi_0    pypi
openssl                   1.1.1w               h7f8727e_0  
pillow                    6.2.2                    pypi_0    pypi
pip                       22.3.1           py37h06a4308_0  
pycparser                 2.21               pyhd3eb1b0_0  
python                    3.7.4                h265db76_1  
pytorch                   1.2.0           cuda100py37h938c94c_0  
readline                  7.0                  h7b6447c_5  
scikit-learn              0.21.3                   pypi_0    pypi
scipy                     1.7.3                    pypi_0    pypi
setuptools                65.5.1                   pypi_0    pypi
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.33.0               h62c20be_0  
tk                        8.6.12               h1ccaba5_0  
torchvision               0.4.0           cuda100py37hecfc37a_0  
wheel                     0.38.4           py37h06a4308_0  
xz                        5.4.2                h5eee18b_0  
zlib                      1.2.13               h5eee18b_0  
zstd                      1.5.5                hc292b87_0

and

(hp) mona@mona-ThinkStation-P7:~$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.2.0'
>>> import torchvision
>>> torchvision.__version__
'0.4.0a0'
>>>

and

(hp) mona@mona-ThinkStation-P7:~$ python -c "import torch; print(torch.version.cuda)"
10.0.130

and

(hp) mona@mona-ThinkStation-P7:~$ uname -a
Linux mona-ThinkStation-P7 6.2.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct  6 10:23:26 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
(hp) mona@mona-ThinkStation-P7:~$ lsb_release -a
LSB Version:	core-11.1.0ubuntu4-noarch:security-11.1.0ubuntu4-noarch
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.3 LTS
Release:	22.04
Codename:	jammy

and

(hp) mona@mona-ThinkStation-P7:~$ nvidia-smi
Tue Oct 24 08:35:58 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX 6000 Ada Gene...    On  | 00000000:52:00.0  On |                  Off |
| 30%   59C    P2              73W / 300W |   4008MiB / 49140MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2417      G   /usr/lib/xorg/Xorg                          452MiB |
|    0   N/A  N/A      2597      G   /usr/bin/gnome-shell                         68MiB |
|    0   N/A  N/A      3098      G   ...AAAAAAAACAAAAAAAAAA= --shared-files       57MiB |
|    0   N/A  N/A      3447      G   ...irefox/3252/usr/lib/firefox/firefox      357MiB |
|    0   N/A  N/A      8414      C   python                                      608MiB |
|    0   N/A  N/A      8704      C   python                                      654MiB |
|    0   N/A  N/A      8973      C   python                                      692MiB |
|    0   N/A  N/A      9484      G   ...sion,SpareRendererForSitePerProcess      111MiB |
|    0   N/A  N/A     12323      C   python                                      890MiB |
+---------------------------------------------------------------------------------------+

and

(hp) mona@mona-ThinkStation-P7:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

and here’s the original error I got when I started the training from the pretrained weights for ape class in LINEMOD dataset for HybridPose framework:

(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose$ LD_LIBRARY_PATH=lib/regressor:$LD_LIBRARY_PATH python src/train_core.py --load_dir /home/mona/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199 --object_name ape
number of model parameters: 12959563
loading checkpoint from /home/mona/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199
Successfully loaded model from /home/mona/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199
/home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/nn/functional.py:1350: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Traceback (most recent call last):
  File "src/train_core.py", line 114, in <module>
    trainer.generate_data(val_loader)
  File "./trainers/coretrainer.py", line 572, in generate_data
    pts2d_pred_loc, pts2d_pred_var = self.vote_keypoints(pts2d_map_pred, mask_pred)
  File "./trainers/coretrainer.py", line 324, in vote_keypoints
    mean, var = estimate_voting_distribution_with_mean(mask, pts2d_map, mean)
  File "/home/mona/HP/HybridPose/lib/ransac_voting_gpu_layer/ransac_voting_gpu.py", line 400, in estimate_voting_distribution_with_mean
    cov=torch.matmul(diff_pts.transpose(2,3), weighted_diff_pts)  # b,vn,2,2
RuntimeError: cublas runtime error : the GPU program failed to execute at /tmp/pip-req-build-58y_cjjl/aten/src/THC/THCBlas.cu:331

The git repo is accessible from here GitHub - chensong1995/HybridPose: HybridPose: 6D Object Pose Estimation under Hybrid Representation (CVPR 2020)

Please note the requirements.txt for this repo states these exact versions for pytorch, torchvision, and cudatoolkit:

(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose$ cat requirements.txt 
pillow>=6.2.2
pytorch==1.2.0
torchvision==0.4.0
cudatoolkit==10.0.130
opencv==3.4.7
setuptools==65.5.1
scikit-learn==0.21.3

Mona_Jalal · October 24, 2023, 1:39pm

Please note that it didn’t stay stuck it was just too slow. The error that I mentioned regarding cublas is still there unfortunately.

ptrblck · October 24, 2023, 3:16pm

Your RTX 6000 Ada needs CUDA >= 11.x as it’s compute capability 8.9. Your current PyTorch installation comes with CUDA 10, which is not compatible with your GPU, so update PyTorch to any binary using CUDA 11.x.