As we know, when we are training, some examples are easy, some examples are hard, the hard examples generally cause high loss value. I want to focus on the hard examples in a batch, some pseudocode is here
lossvalue = loss(varOutput, varTarget)
lossnp = lossvalue.data.numpy()
easy_index=np.where(lossnp<lossnp.mean())
Mask the value of loss by easy example as zero
The gradient with the easy example would not decrease??
I do not know it’s OK? If the loss is zero, the gradient would not decrease(or change)?