Feature extraction has unreasonable memory consumption

Hi all,

I want to extract flow and rgb features from video frames. To do so I want to use the resnet50 model for rgb features and the raft_large for flow features. Unfortunately, even extracting features for only a single frame (2 for flow) has my memory demands spike to ~400 GB please see screenshot.

Both models are loaded from the PyTorch library with default weights. To ensure same shape of rgb and flow features the last iteration of the raft_models output is used in pytorchs flow_to_image function which is subsequently also fed into the resnet50. Features from resnet 50 are extracted at the ‘flatten’ layer of the network.

I’ve used the following code to disable gradient computation, hoping to decrease the overhead of the model, but the improvements are minor at best.

for param in model.parameters():
param.requires_grad = False

What else could I do to decrease memory consumption?

Best, Jona

Depending on the input size a huge memory consumption might be expected. Could you share the input shape(s) and more information about what exactly is stored?

the input shape of the images is (1,3,448,448) (B, C, H,W). The example above only uses one such image to extract features (respectively 2 for flow).

In previous trials I’ve used 32 images as input (shape=(32,3,448,448)). The memory requirements were not that much greater, which feeds my suspicion that these memory requirements are not as dependent on the model. However, besides disabling gradient computation I don’t see any way to decrease these requirements further.

Best, Jona

Thanks for the update, could you post a minimal and executable code snippet to reproduce the ~400GB usage using this input shape, please?