Different samples use different learning rates

shirui-japina · October 15, 2019, 3:00am

The way I think of is:

Assuming using nn.BCELoss() as loss function.
As here said,

'none' : no reduction will be applied

So I can get the loss value of each sample by:
loss_reduction_none = nn.BCELoss(reduction='none')

Then I can adjust the loss value of each sample like:

loss_reduction_none[1] = criterion_reduction_none[1] * scaling_factor_1,
loss_reduction_none[0] = criterion_reduction_none[2] * scaling_factor_2,
...
loss_reduction_none[n] = criterion_reduction_none[n] * scaling_factor_n,

Because the learning rate works in this way.

w_new = w_old - learning_rate * (∂ loss / ∂ weight)

As I mentioned above, I adjust the loss function value of each sample.
That indirectly solves the problem of learning rate adjustment for each sample.

Then I can use backward and step function:
loss_reduction_none.backward()
optimizer.step()

But I am not sure if PyTorch will use the loss function value which I have modified.
Actually, I don’t know the parameter reduction is designed for what kind of demand.

Is this the right answer?

Maybe this question is something like below: