Multiscale Prediction Loss

Hello, I am currently using a self-made model that extracts feature maps of different scales (currently 256x80x80 and 128x160x160). Next i use two classification heads that each act on one of the feature maps. After this, I just calculate the loss of each head and add them together. Lastly, i use total_loss.backward() where total_loss is the combined loss.

I usually get worse results doing this than just predicting with 1 head. Am I doing something wrong? What are good practices when dealing with multiscale models?