I am training a network on 3D array (voxels) which have a range of [1, -1]. The distribution is such that most of values are 1, very few are between 1 and -1. During training the network does optimize but only for the location where there are 1’s as they dominate the loss. I am using adam optimizer. What can be done such that the network also learns for the location where there are no 1’s. I have tried assigning high weights to the location where there values less than 1 but it does not help.

What’s your loss function? And how much of a higher weight did you assign to the location where there are values less than 1?

I use L1 loss function. Initially I calculated loss as mean of L1 loss but which it is dominated by 1’s as 70-80% values are equal to 1. Later, I tried to mask out values equal to 1 and less than 1 and take their seperate mean. Later I set weight equal to 10 for values less than 1. But some how the network is not converging.

Oh interesting, when you weighted the smaller class by a factor of 10, it just struggled to converge in general right? (while in the original case without weights, it converges to a point where it always predicts the more populated class).

Is data augmentation or self-supervised pretraining an option, or perhaps just need more data instead of overweighting the smaller class?