The way I think of is:
Assuming using nn.BCELoss()
as loss function.
As here said,
'none'
: no reduction will be applied
So I can get the loss value of each sample by:
loss_reduction_none = nn.BCELoss(reduction='none')
Then I can adjust the loss value of each sample like:
loss_reduction_none[1] = criterion_reduction_none[1] * scaling_factor_1,
loss_reduction_none[0] = criterion_reduction_none[2] * scaling_factor_2,
...
loss_reduction_none[n] = criterion_reduction_none[n] * scaling_factor_n,
Because the learning rate works in this way.
w_new = w_old - learning_rate * (∂ loss / ∂ weight)
As I mentioned above, I adjust the loss function value of each sample.
That indirectly solves the problem of learning rate adjustment for each sample.
Then I can use backward and step function:
loss_reduction_none.backward()
optimizer.step()
But I am not sure if PyTorch will use the loss function value which I have modified.
Actually, I don’t know the parameter reduction is designed for what kind of demand.
Is this the right answer?
Maybe this question is something like below: