This seems odd. For a 15-class classifier, I would expect your last
layer to produce 15 values (not 11).
In this case I would expect the output of your model (the input to BCEWithLogitsLoss) to have shape [nBatch, nClass = 15], and
your labels (the target) to have the same shape. The output of
your model should be logits, that is, the output of the last Linear
layer of your network, not passed through any non-linear activation
function. Your target values will typically be 0 or 1 (although they
could be probabilities that range from 0.0 to 1.0).
Training with pos_weight (when greater than 1.0) should, indeed,
cause your trained network to produce more positive predictions. If
you make pos_weight large enough I would expect your network to
be able to make positive predictions (but, of course, possibly false
Can you successfully overfit your network by training it on a small
subset of you training set so that it produces (nearly) perfect results
on that subset, and therefore produces positive predictions?