Track memory allocated by pytorch

is there a reliable way of tracking the effective memory that pytorch keeps allocated during the forward pass, including intermediate buffers that will be needed during the backward pass?
Unfortunately a simple check of tensors that are still referenced (via gc.get_objects()) gives just a lower bound.

For example consider the following code:

import torch
import torch.nn as nn
import torch.autograd as autograd
import gc

c1 = nn.Conv2d(4, 4, 3)
c2 = nn.Conv2d(4, 4, 3)
x = autograd.Variable(torch.randn(4, 4, 16, 16), requires_grad=True)

y = c2(c1(x)).mean()


for v in gc.get_objects():
   if isinstance(v, torch.Tensor):

It outputs
(4L, 4L, 3L, 3L)
(4L, 4L, 3L, 3L)
(4L, 4L, 16L, 16L)

while I would expect an additional tensor of size (4L,4L,14L,14L) that is needed to compute the gradient with respect to the parameters of c2.