I am trying to implement an adaption to the classical Retinanet object detector. Retinanet could be considered a multitask network with classification and localization as separate tasks. Retinanet share a common backbone with 2 heads for localization and classification.
The normal way of train this network is to calculate separate losses L_class and L_loc for both tasks, summing them:
L_total = L_class + L_loc.
The training procedure is then:
Lclass, L_loc = loss(retinanet(x) )
optimizer.zero_grad() Lclass, L_loc = loss(retinanet(x) ) L_total.backward() optimizer.step()
I would like to have 2 separate optimizers for the classification task and the localization task. My question is how to do this correctly to make sure that both heads are trained with separate optimizers ( opt_class, opt_loc) . I would propose this to be in my training loop:
# Localization first opt_loc.zero_grad() _, L_loc = loss(retinanet(x) ) L_loc.backward() opt_loc.step() # Classification second opt_class.zero_grad() L_class,_ = loss ( retinanet(x)) L_class.backward() opt_class.step()
Would this be a sound approach and, is it possible to do this without creating addtional gpu memory usage ?