Retain Graph- Branch Convolutional Neural Network

I trying to implement B-CNN: Branch Convolutional Neural Network for Hierarchical Classification.-https://arxiv.org/pdf/1709.09890.pdf

image

I have a doubt-
Should I backpropagate losses from the branches individually (using retain graph)

OR

Should i simply add the losses from all the branches and then perform backpropagation.

Both is ok, while the latter seems better.

I wanted to know why 2nd is better.

Is it because- in 2nd case I need only one backprop compared to 3 backprops in 1st case?

By better, I think @Naruto-Sasuke means that your code would “look” better and would have less lines, therefore less surface area for software bugs. AFIK, Pytorch internally is able to see what are the variables that have been worked upon and is able to find the required grads, without us explicitly asking to calculate the grads for each of the branches.

Hi @ganLover is it possible to share the code used to implement the B-CNN architecture?