Multi-label classification: all predictions become zero

I’m working on a multi-label classification. The number of labels are 122, and for each sample, a maximum of 10 of these labels are one and the rest are zero. (so we have sparse multi-hot encoding)
I used BCEWithLogitsLoss as a loss function in two ways:

1- without weight
criterion = nn.BCEWithLogitsLoss()
outputs = self.model(inputs)
loss = criterion(outputs, targets)

2- with weight
criterion = nn.BCEWithLogitsLoss(reduction=‘none’)
outputs = self.model(inputs)
loss = criterion(outputs, targets)
loss = (loss * self.CLASS_WEIGHT).mean()

In both cases, after a few iterations of the first epoch, the f1-score value is zero and the reason is that the model predictions are zero. Since Target tensor is spars, what is the solution to improve the loss function to prevent this problem?

Hello! For simplicity, let’s focus on the “1- without weight” version. Even though the target is sparse, you shouldn’t be seeing predictions all go to zero. Can you share an executable snippet of code that has this issue, just so we can double-check that you’re not doing anything unusual in the model? For example, a common mistake is to pass the output through your own Sigmoid layer, which you don’t need to do since your loss function does it for you already.