Hey there super people!
I am having issues understanding the BCELossweight parameter. I am having a binary classification issue, I have an RNN which for each time step over a sequence produces a binary classification. Precisely, it produces an output of size (batch, sequence_len) where each element is in range 0 - 1 (confidence score of how likely an event happened at one time-step).
Because the ground truth is imbalanced (more 0s than 1s), I would like to make use of BCELossweight parameter, but I clearly do not understand the the DOCs well enoughâ€¦

What is the size of this parameter and which elements does it take?

If you would like to use the weight as a class weight, I assume you have a binary target, i.e. it contains only zeros and ones.
In that case I would create a weight tensor and just multiply it with your unreduces loss. Here is a small example:

Good to hear.
Maybe the new pos_weight argument for nn.BCEwithLogitsLoss might work better, since there is a difference in my weighting approach and the implemented one. See this discussion.

Do I understand it correctly, that if I have a binary model-output of size (N, 1) the pos_weight argument should be of size (1)? If I have a class distribution of letâ€™s say 900:100 (900 zeros in my ground truth to 100 ones) then pos_weight = torch.Tensor([(900 / 100)])? I guess all zeros are negative examples and the ones are positives, thatâ€™s also confusing me

Alright, tried it . My confusion matrix starts with only having FNs and TNs and the model never predicts the class I want it to (still always outputs tensors of 0s)

EDIT: The softmax output seems to be always a tensor of (0.0325, â€¦, 0.0325), already after the first batch

@ptrblck, i just wonder how to apply the weight in BCELoss in real training scenario. That is, the dataset is splitted into training, validation and testing subset. What is more, during model training, there are some epoches and an epoch is comprised of one or more batches. So the weight can be calculated in dataset level, training subset level or even batch level. How to choose? Note that, in batch level, the number of positive or negative in each class may be zero, thus, the following will be occurred: pos_weight = torch.Tensor([(negative examples / 0)]).
Another question is that Adamax and BCELoss is used but my F1 score is just 0.04, how to tune the performance?

You would usually calculate the pos_weight using the complete training dataset and pass it to the instantiation of the criterion, so that a division by 0 should not occur.