Mixed Graphics cards issue, Turing & Ampere

I had what I thought was a good idea to run my display with my old GTX-1650 Super (Turing architecture) to free up all of the meager 12GB VRAM of my RTX 3060… I read about the env variable

CUDA_VISIBLE_DEVICES where one provides a comma separated list of ranges/devices, e.g. 0,1 or 0-1 etc. at least from what docs I read. Sadly it doesn’t seem to behave like that. I set it to 1, and instead of going to device 1, it trims its list to one item, (card 0 is the 1650) and promptly blows up at he first call to flash_attention2. Unless it blows up first trying to allocate 8GB of VRAM on a 6GB card…

[rhiyddun@sayshell:/home/rhiyddun/petition]$nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1650 SUPER (UUID: GPU-0c6bf56b-604a-97f5-e736-11c431cb7b1c)
GPU 1: NVIDIA GeForce RTX 3060 (UUID: GPU-191dca4a-789d-e46b-cd1a-a647fdc57efc)

Turing and Ampere just don’t mix it seems. Is there some way to restrict torch to just the one card?

I disabled the env variable, then it starts with the RTX, but then goes right to the GTX1650 once the RTX3060 allocates most of its memory, even after explicitly setting the device. There’s also a nomenclature issue…

[rhiyddun@sayshell:/home/rhiyddun/petition]$time python begin.py
<frozen importlib._bootstrap>:491: RuntimeWarning: overflow encountered in cast
None is in error
PyTorch current_device(): 0 # from torch.cuda.current_device()
Actual card name: NVIDIA GeForce RTX 3060
Memory: { 11.63067626953125 } GiB

torch calls it device 0, nvidia calls it device 1… who to believe? What to do?

I’d like to keep the acceleration if possible.

This is expected and exactly how CUDA_VISIBLE_DEVICES works. This env variable makes only the specified devices visible in the process and PyTorch maps them to the device indices [0, num_devices-1].

If CUDA_VISIBLE_DEVICES=0 shows the 1650, use =1 instead to map the 3060 to cuda:0 inside the PyTorch script.