F.CROSS_ENTROPY weight parameter does not seem to have an effect

Hey Guys

I am currently building a multi-task ML model using both F.binary_cross_entropy and F.cross_entropy. Since my datasets are quite unbalanced i am implementing class weights and giving them as argument to both functions to combat this. However, i am experiencing something quite peculiar when adding it to F.cross_entropy. It seems to have absolutely no effect on the loss, see below (code ran in VSC debugger):

F.cross_entropy(out_AU_intensities[i][AU_idx], lab[AU_idx] - 1, weight = self.cw_int[i])
>tensor(1.3951, grad_fn=<NllLossBackward0>)

F.cross_entropy(out_AU_intensities[i][AU_idx], lab[AU_idx] - 1)
>tensor(1.3951, grad_fn=<NllLossBackward0>)

And the actualy rescaling is as follows:

self.cw_int[i]
>tensor([5.5556e+00, 2.8571e+01, 1.0000e+05, 1.0000e+05])

To me it looks like it simply is not scaling the weights. However, it does work when i call the F.binary_cross_entropy:

F.binary_cross_entropy(out_AU, AUs, weight = self.cw_AU)
>tensor(20765.5312, grad_fn=<BinaryCrossEntropyBackward0>)
F.binary_cross_entropy(out_AU, AUs)
>tensor(0.6949, grad_fn=<BinaryCrossEntropyBackward0>)
self.cw_AU
>tensor([1.0000e+00, 1.0000e+05, 1.2500e-01, 1.0000e+05, 1.0000e+05, 1.0000e+05,
        4.0000e-02, 1.6667e-01, 1.6667e-01, 3.3333e-01, 1.0101e-02, 1.9231e-02])

Has anyone else experienced this problem, or am i just miss-understanding the implementation

I guess your target might contain the same target labels for all samples in the batch, which would then remove the weighting during the normalization step.
This post shows you how the weighting is implemented internally and as you can see, the weighted loss will be normalized with the sum of the weights in the last step. If all labels are equal, weighting won’t have any effect.