I’m currently using a Unet style network to predict 3 target time-frequency features from a stereo recording, two are directional, and one is the diffuseness. So the output shape of my network is something like
(features, frequency, timestep)
At present I’m just using a single loss metric (MSE) over the output of the network. Apart from getting a more detailed view of the networks progress on each of the individual target features, are there any benefits to using a separate loss function for each feature and then summing them to get the overall loss for backprop?
My intuition is if all target features are “equally important” so the loss are all equally weighted, it won’t affect the weight updates and therefore won’t affect the networks training. But it may provide insight into which features the network is more easily able to learn the desired mapping for.