I have implemented a variant of RetinaNet (https://github.com/milani/science-bowl-2018). I run it on a single GTX 1070 (8GB) and it works fine. Then I move it to a new system with two 1080 Ti (2x11GB) and increase batch size from 1 to 2. After a few iterations, I get cuda out of memory error.
When I look at nvidia-smi
output, the memory allocation is uneven.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A |
| 44% 75C P2 159W / 250W | 10157MiB / 11178MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:03:00.0 On | N/A |
| 46% 78C P2 166W / 250W | 2514MiB / 11175MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1279 G /usr/lib/xorg/Xorg 9MiB |
| 0 1328 G /usr/bin/gnome-shell 6MiB |
| 0 31639 C python3 10109MiB |
| 1 1279 G /usr/lib/xorg/Xorg 14MiB |
| 1 31639 C python3 2467MiB |
+-----------------------------------------------------------------------------+
Is it normal to have 10G on one memory and only 2.4GB on the other? What do you think is the reason that I get out of memory after increasing batch_size? How should I debug it?