It seems Pytorch doesn't use GPU

minsung · March 29, 2020, 6:38am

First, i apologize for my poor English.

Recently, I bought RTX2060 for deep learning. I installed pytorch-gpu with conda by conda install pytorch torchvision cudatoolkit=10.1 -c pytorch. Of course, I setup NVIDIA Driver too.

But when i ran my pytorch code, it was so slow to train. So i checked task manger and it seems torch doesn’t using GPU at all!

Rather, as shown in picture, CPU was used highly more than GPU. It’s replying true for torch.cuda.is_available(), but overall training speed and task manager’s graph seems torch can’t utilize GPU well. Is there something I missed? Here are my nvidia-smi.

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2984 C …01\Anaconda3\envs\PyTorchGPU\python.exe N/A |
| 0 10636 C …01\Anaconda3\envs\PyTorchGPU\python.exe N/A |
±----------------------------------------------------------------------------+

charan_Vjy · March 29, 2020, 6:49am

Did you push your model to the GPU as shown below?

device = 'cuda'
model = model.to(device)

minsung · March 29, 2020, 7:02am

Yes. And I’m using nn.Sequential with .to(device)

self.middle_fc = nn.Sequential(
            nn.Linear(35,1024), 
            nn.ReLU(),
            nn.Linear(1024,248),
            nn.ReLU(),
            nn.Linear(248,128),
            nn.ReLU()
        ).to(device)

charan_Vjy · March 29, 2020, 7:16am

I had actually missed the nvidia-smi output:
It shows that 2779MiB / 6144MiB (46%) is being utilzed. This implies that the model was successfully loaded into the GPU. One empirical way to verify this is to time it using device = 'cpu' and then time it using device = 'cuda' and verify the different runtimes for a batch size greater than 1(Preferabbly, keep as high a batch size as possible). If the runtimes are the same, there is indeed some issue. Really do not have an answer for the 2% part, though.

minsung · March 29, 2020, 7:24am

I’ll be back with MNIST training😁

ptrblck · March 29, 2020, 7:55am

I assume this points towards the shown GPU activity in the task manager.
If so, note that the Windows task manager doesn’t show the compute tab by default and you would have to select it via the drop-down menu.
I would recommend to stick to nvidia-smi to see the device utilization.

minsung · March 29, 2020, 8:35am

I found that when I run the code with cuda, nvidia-smi prints something different from cpu!

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2984 C …01\Anaconda3\envs\PyTorchGPU\python.exe N/A |
| 0 5480 C+G …01\Anaconda3\envs\PyTorchGPU\python.exe N/A |
| 0 10256 C …01\Anaconda3\envs\PyTorchGPU\python.exe N/A |
| 0 10636 C …01\Anaconda3\envs\PyTorchGPU\python.exe N/A |
±----------------------------------------------------------------------------+

Could you let me know what the ‘11%’ means?

ptrblck · March 29, 2020, 8:46am

11% refer to the relative time spend in executing a kernel on the device given a reference timeframe.
E.g. in the last 100ms, 11ms were spent in executing a kernel on the device.

You can push the GPU utilization by trying to push big workloads to the device, as e.g. a lot of small computations will add overhead of the kernel launches etc.

minsung · March 29, 2020, 9:24am

I understand! So thankful

minsung · March 29, 2020, 9:33am

I found that GPU is generally faster than CPU in my MNIST example.
I think that the problem was the batch-size. It was so small so that difference between GPU and CPU was low. Thanks a lot!!

Deepesh7 · December 14, 2020, 5:06pm

Selecting the Cuda tab(instead of 3D,video encode or whatever is default) in task manager worked for me.