Since my training samples have different importance, I wonder how to use this information in the training pass.
For example, I have a function to assign each individual sample an importance factor, the first idea that comes to my mind is to scale the gradients according to this factor, which emphasizes some samples. However, I think this is quite difficult to implement since pytorch averages the loss at the mini-batch level.
Do you have any suggestions? Thanks.
I think the most clean way is to write your own loss function that supports the weight (some loss functions like BCELoss already have example weights, but one has to be careful because others weight classes instead).
If you don’t want to do that, you could add a hook to achieve this to the prediction, similar to this (on master / pytorch 0.2):
a = Variable(torch.randn(3,3), requires_grad=True)
loss = a.sum()
def scale_gradients(v, weights): # assumes v is batch x ...
return g*weights.view(*((-1,)+(len(g.size())-1)*(1,))) # probably nicer to hard-code -1,1,...,1
b = Variable(torch.randn(3,3), requires_grad=True)
loss2 = b.sum()
For pytorch 0.1.12 you would need .expand_as(g) after the view.