Libcudnn_cnn_infer.so.8. Error: libnvrtc.so and torch._inductor.utils: [WARNING] not enough cuda cores to use max_autotune mode

FurkanGozukara · March 29, 2023, 8:48pm

i am trying to run dreambooth on runpod

unfortunately pytorch team removed xformers older version
i cant believe how smart they are
now we have to use torch 2
however it is not working on runpod

here the errors and steps i tried to solve the problem

I have installed Torch 2 via this command on RunPod io instance

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Everything installed perfectly fine

With Torch 1 and Cuda 11.7, I was not getting any error but with Torch 2 the below error produced

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory

How to fix?

It is using unix

On Windows same prodecure working very well

Using Automatic1111 web UI to use Stable Diffusion

this above i couldnt solve

therefore i have done the following things

apt update
apt install sudo
sudo apt install nvidia-cudnn
sudo apt-get install python3-dev

after installing all above

now i have this warning and training never progress

Steps: 0%| | 0/170 [00:00<?, ?it/s][2023-03-29 18:50:26,163] torch._inductor.utils: [WARNING] not enough cuda cores to use max_autotune mode

now when i run below python code i see everything looking good

import torch

# Check if CUDA is available
if torch.cuda.is_available():
    print("CUDA is available")
    # Display the current GPU name
    print("GPU name: ", torch.cuda.get_device_name(torch.cuda.current_device()))
else:
    print("CUDA is not available")

# Verify the PyTorch version
print("PyTorch version: ", torch.__version__)

import torch
print(torch.cuda.get_device_properties(0).multi_processor_count)

test.py result

CUDA is available
GPU name:  NVIDIA RTX A4500
PyTorch version:  2.0.0+cu118
56

it is able to generate images with 15.58it which is very fast

any help appreciated very much

ptrblck · April 1, 2023, 6:58pm

The first issue is related to this topic.

The Inductor warning is raised from here, which indicates that max_autotune mode requires GPUs with a min. sm count of 80, while your GPU seems to have less.

FurkanGozukara · April 1, 2023, 7:13pm

RTX A4500 has lesser than 80?

ptrblck · April 1, 2023, 7:16pm

You can check it via print(torch.cuda.get_device_properties(index).multi_processor_count) where index corresponds to the device index in your system.
Based on a quick search for the specs of the RTX A4500 it seems its SM count is 56.

FurkanGozukara · April 2, 2023, 3:24pm

ok now the issue is, with pytorch 1.13 - xformers compiled for 1.13, everything works perfect
but with torch 2, we are getting errors and cant train on the same card

what do yo think about it?