How to debug causes of GPU memory leaks?

Hello! I am trying to use this technique to debug, but the amount of GPU memory used seems to be an order of magnitude larger than the tensors being allocated.

After running a forward pass on my network I use the code above:

for obj in gc.get_objects():
    if torch.is_tensor(obj) or (hasattr(obj, 'data') and torch.is_tensor(obj.data)):
        print(reduce(op.mul, obj.size()) if len(obj.size()) > 0 else 0, type(obj), obj.size())

GPU Mem used is around 10GB after a couple of forward/backward passes.

(161280, <class 'torch.autograd.variable.Variable'>, (5, 14, 3, 24, 32))
(451584, <class 'torch.autograd.variable.Variable'>, (14, 14, 3, 24, 32))
(612864, <class 'torch.autograd.variable.Variable'>, (19, 14, 3, 24, 32))
(612864, <class 'torch.autograd.variable.Variable'>, (19, 14, 3, 24, 32))
(2, <class 'torch.autograd.variable.Variable'>, (2,))
(420, <class 'torch.autograd.variable.Variable'>, (30, 1, 14))
(1026000, <class 'torch.autograd.variable.Variable'>, (19, 15, 450, 8))
(202, <class 'torch.autograd.variable.Variable'>, (2, 101))
(0, <class 'torch.autograd.variable.Variable'>, ())
(3, <class 'torch.autograd.variable.Variable'>, (3,))
(70, <class 'torch.autograd.variable.Variable'>, (5, 14))
(45, <class 'torch.autograd.variable.Variable'>, (45,))
(13230, <class 'torch.autograd.variable.Variable'>, (90, 3, 7, 7))
(90, <class 'torch.autograd.variable.Variable'>, (90,))
(10, <class 'torch.autograd.variable.Variable'>, (10,))
(735, <class 'torch.autograd.variable.Variable'>, (15, 1, 7, 7))
(15, <class 'torch.autograd.variable.Variable'>, (15,))
(8, <class 'torch.autograd.variable.Variable'>, (1, 1, 1, 8, 1))
(808, <class 'torch.autograd.variable.Variable'>, (101, 8))
(101, <class 'torch.autograd.variable.Variable'>, (101,))
(3, <class 'torch.autograd.variable.Variable'>, (3,))
(70, <class 'torch.autograd.variable.Variable'>, (5, 14))
(45, <class 'torch.autograd.variable.Variable'>, (45,))
(13230, <class 'torch.autograd.variable.Variable'>, (90, 3, 7, 7))
(90, <class 'torch.autograd.variable.Variable'>, (90,))
(10, <class 'torch.autograd.variable.Variable'>, (10,))
(735, <class 'torch.autograd.variable.Variable'>, (15, 1, 7, 7))
(15, <class 'torch.autograd.variable.Variable'>, (15,))
(8, <class 'torch.autograd.variable.Variable'>, (1, 1, 1, 8, 1))
(808, <class 'torch.autograd.variable.Variable'>, (101, 8))
(101, <class 'torch.autograd.variable.Variable'>, (101,))

You can see the biggest variable here should only total in at around 10MB, and altogether, they shouldn’t need much more space than this. Where is the hidden memory usage? My batch size IS variable, as referenced above, but I OOM after only a few batches. Am I confused about something, or is it using 10-100x more memory than it should?

Thanks for any insight!