Weird profiler stack trace : Not sure if GPU is enabled on a local machine

I am currently working on profiling, learning about torch profiler and tensorboard using it. I am using this tutorial : PyTorch Profiler With TensorBoard — PyTorch Tutorials 2.1.0+cu121 documentation.

The thing is that I tried it using google colab & my own local computer that has a RTX2080. When I run the exact tutorial code with colab I am obtaining a similar report, telling me about GPU stuff…
Yet when I run on my machine (after a call of .cuda() I checked that my models & data were on cuda and according to pytorch they are) I am obtaining the following result :

This seems to indicate that all the operation performed were performed on CPU (and the device is CPU), while my device is cuda.

I believe that it might be because I think I did not install cudatoolkit on my machine (but I did install torch using pip), but can this line :

device = cuda if torch.cuda.is_available() else cpu

return cuda if cudatoolkit is not installed on my machine ? Is it possible to put models & data on cuda and still perform operation on cpu ?

And if it is and if I am performing operation on cpu : I noticed that it is still way quicker to have model & data on cuda than on cpu, how can it be quicker ?

Thank you for your time and help

Yes, since PyTorch binaries ship with their own CUDA dependencies and you thus won’t need to install a local CUDA toolkit.

If the model and data were moved to the GPU their operations will be performed on the GPU. Other tensors (on the CPU) can still perform operations on the CPU.

If you moved the model and data to the GPU all operations will be performed on the GPU.

To your profiler question: I’m not familiar enough with the built-in profiler and use Nsight Systems instead, so you might also give it a try.

Thank you, I will have a look and make a profiling using it, we’ll see if this one understand that operations are done on GPU.