Per-class and per-sample weighting

Ah, that sounds right. Let me repeat this back to make sure I’m on the same page.

I’d have a network output 3D tensor of (R recordings, C classes, S samples). CrossEntropyLoss, with reduction=‘none’ and a class_weight tensor of C classes, would return a 2D tensor of losses in (R recordings, S samples). Then I would multiply each R by the the unique sample_weight 1D tensor for that R. And finally average this before calling backward().

Does that sound correct? Btw, each recording has a different number of samples. Which, if I understand the benefits of the dynamic graph in pytorch, shouldn’t matter.

I’m still wrapping my head around moving from keras to pytorch. Thank you.