Running out of memory extracting inferences at a certain level in CNN

I have downloaded a pre-trained model and I am trying to extract the values at a certain layer when running an image through the model and for some reason, I can’t process more than around 70 images before I get a CUDA out of memory error. Is there something I am missing in my setup? I tried setting up batching with a data loader as well but that hasn’t helped the issue.

Is there something I need to be doing to manually remove the tensors I don’t need anymore?

Thanks in advance for the help!

# load in pretrained model
device = torch.device('cuda:0')
model = models.resnet101(pretrained=True)

outputs = []

def hook(module, input, output):
    output = torch.reshape(output, (-1,))
    output = torch.unsqueeze(output, 0)

model.layer4[0].conv2.register_forward_hook(hook) # register hook to access specific layer
# model.cuda('cuda:1')

data_dir = 'data/tiny-imagenet-200/'

test_transforms = transforms.Compose([transforms.Resize(224),

dataset = {'predict' : datasets.ImageFolder(data_dir + "test/", test_transforms)}
dataloader = {'predict':['predict'], batch_size=15, shuffle=True, num_workers=1)}

predictions = []
outputs = []
for inputs, labels in tqdm(dataloader['predict']):
    inputs =
    output = model(inputs)
    output =
    index =

It looks like your hook is saving a reference to the CUDA tensor in the list. You might want to get rid of that since memory is accumulating.

You also invoking another round trip to the GPU when you call output = which you then call back to the cpu. you can also get rid of that line.

Thanks for the reply! Sorry I’m still a little new to pytorch but when you say I’m saving a reference to the CUDA tensor does that mean I should move it out of the GPU before I save it? I’m trying to save all of those values so I can do some other work on them.

Yup! You should be saving the tensors on the CPU side. When you store a reference to a tensor (i.e in a python List), they can’t be deleted by python garbage collection. So in the case of storing the Cuda Tensor, it just grows Cuda memory usage until it is full, where as CPU ram would be much larger.