Pytorch and CUDA environment setup issue

Hi, all!

I’m trying to setup the conda environment from [link] on my machine.

Below is my machine specification from nvidia-smi command output.
±----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A5000 Off | 00000000:21:00.0 Off | Off |
| 30% 57C P2 68W / 230W | 1404MiB / 24564MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

The requirement provided in the GitHub repo is:

  • python=3.8.3
  • pytorch=1.7.0
  • torchvision=0.8.1
  • cudatoolkit=10.1
    =====================

Setup docker environment using image nvcr.io/nvidia/pytorch:20.09-py3.

During the setup, I received this issue:
NVIDIA NVIDIA RTX A5000 with CUDA capability sm_86 is not compatible with the current PyTorch installation

I went through this post where it was mentioned to upgrade pytorch
I configured new conda environment which includes:

  • python=3.8.3
  • pytorch=1.9.0
  • torchvision=0.10.0
  • cudatoolkit=11.1
    =====================

Now, the training is going on fine, but the intermediate generated images look very unexpected.
I’m not facing any issue when installing and training in the machine with RTX 2080.
Is it because of the required env has to do something with CUDA or GPU architecture?

Install the current stable or nightly PyTorch binary with CUDA 11.8 or 12.1 and it should work.