Yes, your idea sounds reasonable. I’m not familiar with your exact use case, but you should consider normalizing the reduced mean with the used weight as seen here to avoid creating a data-dependent loss spike.
1 Like