Identifying if gpu cores are used

Stephane_Ancelot · February 19, 2025, 7:45am

Hi,

I have got the setup described in the image.

Either on windows or wsl2. altough the cuda is available, I don’t manage to run llm inference on gpu.
It looks like the models are loaded on gpu memory, but It seems using cpu for inference.

Is there a way to monitor gpu activity ?

I setted up the pip with extra url cuda torch version

ptrblck · February 19, 2025, 2:14pm

nvidia-smi should show the activity. Move a large tensor to the device and execute a matmul in a loop after you confirmed the tensor was moved to the GPU by checking its .device attribute as a quick check.