Estimation of GPU usage in Pytorch

Hi all, I am trying to pre-estimate the total amount of GPU memory that could be used by a Pytorch model during training.

As far as I understand, a model would require GPU memory for three things:

  1. its parameters,
  2. the input data for the network and
  3. the output features at each layer of the network

Also, the features/activations and the parameters at each layer of the network are stored twice, once for the forward pass and once for the backward pass.

Accordingly, the expression for estimating the gpu usage should be something like

total_gpu_usage = 2 x batch_size x (input_data_size + \
		sum(feature_size_at_each_network_layer)) + 2 x parameter_size

Unfortunately, this always severely underestimates (almost always by a factor of 1.5 to 2) the actual gpu usage for different networks including baseline architectures like Alexnet, VGG etc.

Am I missing anything here? If so, please help with possible remedies or link me any working code that estimates the gpu usage in pytorch.

Thanks in advance!