Pytorch: Cuda synchronize out of memory

Calls to almost all CUDA functions are causing an out of memory error:

In [2]: torch.cuda.is_available()
Out[2]: True

In [3]: torch.cuda.device_count()
Out[3]: 2

In [4]: torch.cuda.device(1)
Out[4]: <torch.cuda.device at 0x7f0024dc7668>

In [5]: torch.cuda.device(0)
Out[5]: <torch.cuda.device at 0x7f0024dc7f98>

nvidia-smi:

Sat Nov  4 20:32:12 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.82                 Driver Version: 375.82                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:01:00.0      On |                  N/A |
| 97%   67C    P2    83W / 198W |   7831MiB /  8105MiB |     94%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:02:00.0     Off |                  N/A |
| 34%   56C    P2    49W / 200W |   6172MiB /  8114MiB |     25%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0       702    G   /usr/lib/xorg-server/Xorg                      169MiB |
|    0       942    G   /usr/bin/gnome-shell                           128MiB |
|    0      1743    G   ...el-token=CD354235E476D5C9CE534143076E615F    45MiB |
|    0      6863    C   python                                         287MiB |
|    0     14361    C   python                                        1985MiB |
|    0     14705    G   /usr/bin/nvidia-settings                         0MiB |
|    0     31964    C   python                                        5211MiB |
|    1       702    G   /usr/lib/xorg-server/Xorg                        7MiB |
|    1      6863    C   python                                        6159MiB |
|    1     14705    G   /usr/bin/nvidia-settings                         0MiB |
+-----------------------------------------------------------------------------+

Its weird given there are other training sessions running on the two GPUs, which are not OOM-ing, and there is enough free memory left!

$ uname -r
4.9.48-1-MANJARO
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

These other sessions likely have cached blocks (we use a caching memory allocator).

When you initially do a CUDA call, it’ll create a cuda context and a THC context on the primary GPU (GPU0), and for that i think it needs 200 MB or so. That’s right at the edge of how much memory you have left.

1 Like

Ah that explains it, thanks!

I quit one of the sessions on GPU0 and everything was back to normal.