Let’s say we have a nB x 2048 prediction tensor and nB x 2048 ground truth/label tensor.

Let’s say we plug both of these into BCEWithLogitsLoss.

The docs says: * **pos_weight** ([*Tensor*](https://pytorch.org/docs/stable/tensors.html#torch.Tensor)*,* *optional*) – a weight of positive examples. Must be a vector with length equal to the number of classes.

But, if you make a pos_weight of size: nB x 2048 instead of the required 2048, the BCEWithLogitsLoss still accepts the pos_weight. Why is that? What happens when you do the former; does it just weight every single scalar?