I’ve pretrained a CNN module, which I want to insert to object detection deep neural networks.
However, the CNN module I’ve designed will first largely increase the number of feature channels, and then decrease it as the input number of feature channels, which may need to store large feature maps during the forward process or large gradient maps during the backward process and occupies too much CUDA memory, while I just need the total gradient of the module wrt the input feature map.
Since I don’t need the module parameters to be updated, I’ve set
for name, p in ce.module.named_parameters():
p.requires_grad = False
But as it shows in my previous post, to ensure the module is differentiable, pytorch just doesn’t calculate the gradients wrt layer weights and the data size it stores might not be changed.
So I’m wondering if it is possible to multiply the gradient of module layer by layer during the forward process and during the backward process we just need to multiply it with the rest parts’ gradients, instead of saving inputs, outputs, and gradients per layer.
Is it possible to do such a thing in pytorch or is there a more elegant way to do it?