My question is that is it necessary to store the gradient/feature maps of the frozen (require gradient = False) non-linear intermediate layers of a Conv Neural Network?
My question comes from pre-training a network based on the following observations.
- If I fix (freeze) the low-level layers of a network and only update the weight of the higher-level layers, the Pytorch frees some memory (no need to save the feature and gradient maps for the frozen layers). So it will save some memory by fixing the low-level layers.
- If I only fix the intermediate layers (neither high-level nor low-level layers), the memory usage is the same as when I update all weights (including low, intermediate and high-level layers).
So my question is would it be possible for Pytorch to free the gradient/feature maps of these frozen intermediate layers to save some memory?
My initial thoughts are if all the frozen intermediate layers are linear operations (though often not this case of a Conv Neural Nets), so we don’t need to save the gradient/feature maps since the whole intermediate layers basically only do a linear operation. But if there are some non-linear operations (conv, relu, etc) in the frozen layers, is it still necessary to store the gradient/feature maps of these non-linear frozen layers?