Libcudnn_cnn_infer.so.8. Error: libnvrtc.so and torch._inductor.utils: [WARNING] not enough cuda cores to use max_autotune mode

i am trying to run dreambooth on runpod

unfortunately pytorch team removed xformers older version
i cant believe how smart they are
now we have to use torch 2
however it is not working on runpod

here the errors and steps i tried to solve the problem

I have installed Torch 2 via this command on RunPod io instance

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Everything installed perfectly fine

With Torch 1 and Cuda 11.7, I was not getting any error but with Torch 2 the below error produced

Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory

How to fix?

It is using unix

On Windows same prodecure working very well

Using Automatic1111 web UI to use Stable Diffusion

this above i couldnt solve

therefore i have done the following things

apt update
apt install sudo
sudo apt install nvidia-cudnn
sudo apt-get install python3-dev

after installing all above

now i have this warning and training never progress

Steps: 0%| | 0/170 [00:00<?, ?it/s][2023-03-29 18:50:26,163] torch._inductor.utils: [WARNING] not enough cuda cores to use max_autotune mode

now when i run below python code i see everything looking good

import torch

# Check if CUDA is available
if torch.cuda.is_available():
    print("CUDA is available")
    # Display the current GPU name
    print("GPU name: ", torch.cuda.get_device_name(torch.cuda.current_device()))
else:
    print("CUDA is not available")

# Verify the PyTorch version
print("PyTorch version: ", torch.__version__)

import torch
print(torch.cuda.get_device_properties(0).multi_processor_count)

test.py result

CUDA is available
GPU name:  NVIDIA RTX A4500
PyTorch version:  2.0.0+cu118
56

it is able to generate images with 15.58it which is very fast

any help appreciated very much

The first issue is related to this topic.

The Inductor warning is raised from here, which indicates that max_autotune mode requires GPUs with a min. sm count of 80, while your GPU seems to have less.

RTX A4500 has lesser than 80?

You can check it via print(torch.cuda.get_device_properties(index).multi_processor_count) where index corresponds to the device index in your system.
Based on a quick search for the specs of the RTX A4500 it seems its SM count is 56.

ok now the issue is, with pytorch 1.13 - xformers compiled for 1.13, everything works perfect
but with torch 2, we are getting errors and cant train on the same card

what do yo think about it?