Understanding memory consumption


I am trying to a do a calculation of the size occupied by the feature maps in a DNN model. For things like ReLU we can use nn.Functional or define ReLU as a layer in the init function. My question is in the nn.Functional case would we be considering the ReLU as a layer and using the inputs/outputs from it to calculate feature map size. When I define it as a layer in init, it does make sense to do that. But I am not sure about that when using ReLU from nn.Functional considering its not really a layer that i have defined. Can someone please clarify?. Thanks

Memory allocations are linked with operations on tensors, layers just wrap groups of these operations together. So, it is kinda hard to measure memory, as layer code may create intermediate tensors, some will be freed early, some kept for backpropagation. Check pytorch_memlab if you need a detailed profile.

ReLU allocates new memory for output by default, and it is usually not released, as next layer (linear/conv) keeps it for gradient calculations.