Assume we have a network-1 with a set of convolution layers (final conv feature map size is B.512.16.16) followed by avgpool2D (B.512) and lastly a FC layer (output size B.128). I have trained this network with loss 1. Later i load this network and finetune it with either loss 2 or with loss 1 + loss 2.

Here are the steps i used in finetuning:

- Load network 1 (lets name it model_ft)
- Define optimizer = optim.Adam(model_ft.parameters(), lr=0.0001)
- Compute network 1 output(lets say out1= (B.128) and out1c = (B.512.16.16)) for some input. Assume network 1 has two outputs (out1, out1c)
- Use above output and do

weights_for_maps = torch.mm(out1, model_ft.fc.weight)

cam_map = weights_for_maps.dot(out1c) to get out_final of size (B.512.16.16) activation map. - Compute Loss 2 = pixel wise loss (out_final, some ground truth)
- Loss 2.backward(), optimizer.step()

Current Problem: Network is training but gradients are zero. Or Loss 2 is constant over epochs. However if i use (Loss 1+ Loss 2).backward(), gradients are non zero because of Loss 1.