You could use BCEWithLogitsLoss
. I’ve created a small dummy example in another thread.
Your model will output logits, which you can feed into a sigmoid layer.
The choice of the threshold depends on your use case, e.g. some classes should have a high sensitivity while others a high specificity.
Have a look at the scikit-learn explanation of multi-class ROCs.