Is it reasonable to use the grad_tensors
argument in the loss.backwards()
method as a “weight” for a given batch?
I have a problem where, for every positive example, many negative examples can be generated virtually for free. However, if for every loss_pos.backwards()
, I calculate and call loss_neg.backwards()
many times, my network would be over-optimized for detecting negative examples as opposed to positive examples. Would it be reasonable to calculate and call loss_neg.backwards(0.1)
10 times for every loss_pos.backwards(1)
?