Is my model using CUDA/GPU?

Sat Jan 12 10:58:39 2019
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 46C P0 77W / 149W | 514MiB / 11441MiB | 38% Default |

| Processes: GPU Memory |
| GPU PID Type Process name Usage |

From this output: since there are no processes, I’m guessing it’s not using CUDA?

I don’t think you are using GPU. If GPU is used, you’ll see something like this:

To check whether you have CUDA access inside your code, you can use:

>>> print(torch.cuda.is_available())

If it return false, I highly recommend to check the CUDA version you have installed and make sure that you have installed corresponding PyTorch version.

Check CUDA version:

>>> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

Install Pytorch 1.0 for CUDA 8.0 and above, follow this link
$ conda install pytorch torchvision cuda80 -c pytorch

Install previson verison of Pytorch for CUDA 9.0 and below, follow this link
$ conda install pytorch=0.4.1 cuda80 -c pytorch

Not necessarily. It looks like the GPU utilization is at 38%.
I’ve seen some weird behavior in nvidia-smi: sometimes showing all processes, sometimes not showing any processes at all on certain servers.
Could you use @alwynmathew’s suggestions to check if CUDA is available?
If so, check if your data (, model parameters, target) is on the GPU using print(tensor.device).

CUDA is available, and doing a torch computation with GPU works

I don’t think that is 38% GPU utilization, because if you look at an earlier column it says 512/11441 MiB are used, which is only around 5% utilization…

Also, we don’t need to explicitly install nvidia-384 or similar in the Dockerfile right?

If you look at dockerfile here, they never explicitly install nvidia-384

The memory usage does not correspond to the GPU utilization which gives as far as I know the percentage of the time a kernel was executed in the last time frame.

No, I think the drivers should be installed directly on the machine and not in the docker file. At least that was my last workflow.