Using `grad_tensors` as a weight for the given batch

Is it reasonable to use the grad_tensors argument in the loss.backwards() method as a “weight” for a given batch?

I have a problem where, for every positive example, many negative examples can be generated virtually for free. However, if for every loss_pos.backwards(), I calculate and call loss_neg.backwards() many times, my network would be over-optimized for detecting negative examples as opposed to positive examples. Would it be reasonable to calculate and call loss_neg.backwards(0.1) 10 times for every loss_pos.backwards(1)?

Sure you can do that. You can also do weighted sum of the losses, and backward them together.

1 Like

That makes sense. Thanks!