How to calculate the gradient when using fixed-weight filters in multiple branches?

Miffy · October 4, 2023, 12:57pm

Recently, I have been implementing a paper at https://www.sciencedirect.com/science/article/pii/S0893608019301315.
The paper describes a branched CNN architecture in which the authors use fixed-weight filters after the general convolution layer to obtain additional feature maps.
They mention in the paper that during backpropagation, the gradients of these feature maps (including the original feature map) are in an accumulative state.

I have created a simple architecture diagram as follows. It’s not a very complex structure, but I am puzzled about how backpropagation works in this context.

ps.“Cat” means concatenate!

I know that Pytorch can automatically compute gradients and perform backpropagation by using .backward(), but do I need to add any specific instructions, or will Pytorch automatically handle the accumulative part for the original feature maps and additional feature maps for me?

Thank you very much for your responses!

ptrblck · October 4, 2023, 7:36pm

I don’t fully understand this statement since feature maps (assuming these refer to forward activations) are created for each new input in the forward pass and will thus not accumulate gradients. Trainable parameters will accumulate gradients in their .grad attribute, but you’ve also mentioned that the weight is fixed, so I assume it’s not trainable.

Miffy · October 5, 2023, 4:46am

Dear ptrblck,

I think that during backpropagation, the original feature maps are generated by the convolution layer, so the architecture I mentioned above can only update the convolution layer’s filter weights and the three fc layers during backpropagation.

I think there was something wrong with my previous statement, and I’d like to clarify by referring to the original text of the paper.

I think the original text shows that the error which is propagated to the original feature maps, includes the error from the additional feature maps, and the error is equal to the gradient, so the authors use the title Accumulative back-propagation.
That is my understanding, but I’m not sure if it is correct.

And, if my understanding is correct, will Pytorch automatically do the “accumulative back-propagation” for me or I should do it by myself?