I have one network and two different loss functions. One of the loss functions, a kind of regularizer, is a weighted loss function, where the weight decays with number of data points. I want the effect of the other loss function to decay as num. data points increase. However, as num points increase it still has a very strong effect.
model = Model()
opt1 = optimizer.Adam(model.parameters())
opt2 = optimizer.Adam(model.parameters())
loss1 = loss_function1(model, inp, out)
loss2 = (10/(num_samples) ) * loss_func2(model, inp, out)
When “num_samples” is very large (10^9), since the weight is so small, loss_func2’s effect should be negligible (weight=10^-8). However, it still has a strong effect. At smaller data points, it is working as expected, and it has a strong effect. I am not sure what’s wrong.