Is it reasonable to use the
grad_tensors argument in the
loss.backwards() method as a “weight” for a given batch?
I have a problem where, for every positive example, many negative examples can be generated virtually for free. However, if for every
loss_pos.backwards(), I calculate and call
loss_neg.backwards() many times, my network would be over-optimized for detecting negative examples as opposed to positive examples. Would it be reasonable to calculate and call
loss_neg.backwards(0.1) 10 times for every