Memory issues on GPU when creating feature matrices from pre-trained resnets

I’m currently processing a video data where I’m wanting to convert each video into 3 feature types: mfcc features for the audio, a directory of 160x160 torch tensors with the cropped faces from each frame (the videos are of zoom conversations), and finally a [n_frames x 512] tensor of facial features from each frame that are computed using a pretrained resnet. The code uses two pretrained nets facenet_pytorch: mtcnn to loacte and crop the faces from each frame, and a resnetv1 instance trained on a facial recognition dataset to extract features from each of these frames. The code I’ve used to do this is:

import torch
import cv2
import torchaudio

from pathlib import Path
from facenet_pytorch import MTCNN, InceptionResnetV1
from moviepy.editor import *

directory = Path(path)
videos = [v for v in directory.iterdir() if str(v).endswith('.mp4')]

for n, video in enumerate(videos):

    device_one = torch.device('cuda:0')
    device_two = torch.device('cuda:1')

    mtcnn = MTCNN(image_size=160, thresholds=[0.4, 0.5, 0.5],
    resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device_two)

    num_none = 0
    num_frames = 0

    print('Processing video at {}'.format(str(video)))

    file_name = str(video).split('/')[-1][:-4]
    file_string = '/PATH/TO/PROCESSED/VIDEO/DATA/' + file_name

    if not os.path.isdir(file_string):
    elif not os.path.isdir(file_string + '/frames'):
        os.mkdir(file_string + '/frames')

    face_feature_tensor = torch.empty(0, 512)

    v = VideoFileClip(str(video))
    audio =
    sr = audio.fps
    audio = audio.to_soundarray()
    audio = torch.FloatTensor(audio).T
    audio = (audio[0] + audio[1]) / 2
    audio = torch.FloatTensor(audio)
    transform = torchaudio.transforms.MFCC(n_mfcc=256, melkwargs={'n_mels': 256})
    mfcc = transform(audio), file_string + '/' + file_name + '')

    cap = cv2.VideoCapture(str(video))

    while cap.isOpened():
        ret, frame =

        if ret:
            num_frames += 1
            cv2.normalize(frame, frame, 0, 255, cv2.NORM_MINMAX)
            # 160 x 160 face frame
            out = mtcnn(frame)

            if out is None:
                num_none += 1

                # 1 x 512 feature vector
                frame_features = resnet(out.unsqueeze(0).to(device_two))
                face_feature_tensor ='cpu'),'cpu')), axis=0)

                zeros = 5 - len(str(num_frames))
                frame_name = 'frame' + '0'*zeros
                frame_name = frame_name + str(num_frames)
      , file_string + '/frames/' + frame_name + '.pt')

            if cv2.waitKey(1) & 0xFF == ord('q'):
            break, file_string + '/' + file_name + '')
    print('Frame loss perc.: {}'.format(num_none/num_frames))

At a certain point the program crashes and I recieve an error that the GPU that resnet is on has run out of memory. This is surprising since the GPU has around 50GB of memory and, if my code is right, is only holding the Resnet model and a 160x160 torch tensor of a cropped face frame at any one time.

Am I missing something here or is there a large memory leak somewhere in the code that I am missing?


I cannot see any obvious error, but would recommend to add print statements to check the GPU memory usage (e.g. via print(torch.cuda.memory_summary())) to check if and where the memory might be increasing.

HI @ptrblck, thanks for your response. After a day of checking and doing exactly as you suggested I found that the .eval() call in resnet = InceptionResnetV1(pretrained='vggface2').eval().to(device_two) was not working as expected. It was still accumulating gradients so I’m guessing that torch.no_grad was not set properly to False for the params in the pretrained network.

For those wondering how to fix this issue, it’s as simple as manually setting the network params such that they do not collect gradients:

    for param in resnet.parameters():
        param.requires_grad = False

Thanks again!

Good to hear you’ve found the issue.
Just a small clarification: calling model.eval() will not disable gradient calculation and will change the behavior of some layers (e.g. dropout will be disabled, running stats used in batchnorm layers etc.), so this is expected.
Your approach of either setting the .requires_grad attribute of all parameters to False or to wrap the forward pass into the with torch.no_grad() context manager is right.
The parameters do not have a no_grad attribute.

1 Like