Uneven gpu memory usage

I have implemented a variant of RetinaNet (https://github.com/milani/science-bowl-2018). I run it on a single GTX 1070 (8GB) and it works fine. Then I move it to a new system with two 1080 Ti (2x11GB) and increase batch size from 1 to 2. After a few iterations, I get cuda out of memory error.

When I look at nvidia-smi output, the memory allocation is uneven.

+-----------------------------------------------------------------------------+                                                  
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |                                                  
|-------------------------------+----------------------+----------------------+                                                  
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |                                                  
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |                                                  
|===============================+======================+======================|                                                  
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |                                                  
| 44%   75C    P2   159W / 250W |  10157MiB / 11178MiB |     99%      Default |                                                  
+-------------------------------+----------------------+----------------------+                                                  
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0  On |                  N/A |                                                  
| 46%   78C    P2   166W / 250W |   2514MiB / 11175MiB |     99%      Default |                                                  
+-------------------------------+----------------------+----------------------+                                                  
                                                                                                                                 
+-----------------------------------------------------------------------------+                                                  
| Processes:                                                       GPU Memory |                                                  
|  GPU       PID   Type   Process name                             Usage      |                                                  
|=============================================================================|                                                  
|    0      1279      G   /usr/lib/xorg/Xorg                             9MiB |                                                  
|    0      1328      G   /usr/bin/gnome-shell                           6MiB |                                                  
|    0     31639      C   python3                                    10109MiB |                                                  
|    1      1279      G   /usr/lib/xorg/Xorg                            14MiB |                                                  
|    1     31639      C   python3                                     2467MiB |                                                  
+-----------------------------------------------------------------------------+  

Is it normal to have 10G on one memory and only 2.4GB on the other? What do you think is the reason that I get out of memory after increasing batch_size? How should I debug it?