I have one network and two different loss functions. One of the loss functions, a kind of regularizer, is a weighted loss function, where the weight decays with number of data points. I want the effect of the other loss function to decay as num. data points increase. However, as num points increase it still has a very strong effect.

model = Model()

opt1 = optimizer.Adam(model.parameters())

opt2 = optimizer.Adam(model.parameters())loss1 = loss_function1(model, inp, out)

opt1.zero_grad()

loss1.backward()

opt1.step()loss2 = (10/(num_samples) ) * loss_func2(model, inp, out)

opt2.zero_grad()

loss2.backward()

opt2.step()

When “num_samples” is very large (10^9), since the weight is so small, loss_func2’s effect should be negligible (weight=10^-8). However, it still has a strong effect. At smaller data points, it is working as expected, and it has a strong effect. I am not sure what’s wrong.