Hello,

I am currently playing around with using class weights in the cce loss and noticed that I get different results from using the reduction “mean” between using one-hot encoded targets vs a sparse label tensor.

It seems to me that the class weights are not used in case of mean reduction and label targets.

Please check the code snippet below where I would expect bot “mean tensor” results to be the same, which is not the case

Do I misunderstood something here?

```
import torch
weight=torch.tensor([0.2,0.8])
softmax = torch.nn.Softmax(-1)
cce_none = torch.nn.CrossEntropyLoss(weight=weight,reduction="none")
cce_mean = torch.nn.CrossEntropyLoss(weight=weight,reduction="mean")
input = torch.tensor([[.1,.6],[.3,.4]])
target = torch.tensor([[1.0,0.0],[1.0,0.0]])
sparse_target = torch.tensor([0,0])
print(input.shape)
print(target.shape)
print("\n log ")
# print(torch.log(input))
soft = softmax(input)
print(soft)
print("softlog",torch.log(soft))
print("softlog*weight",torch.log(soft)*weight)
cce_loss = cce_none(input,target)
print("none",cce_loss)
cce_loss = cce_mean(input,target)
print("mean",cce_loss)
print("\n sparse")
print(sparse_target.shape)
```

Which gives me:

```
torch.Size([2, 2])
torch.Size([2, 2])
log
tensor([[0.3775, 0.6225],
[0.4750, 0.5250]])
softlog tensor([[-0.9741, -0.4741],
[-0.7444, -0.6444]])
softlog*weight tensor([[-0.1948, -0.3793],
[-0.1489, -0.5155]])
none tensor([0.1948, 0.1489])
mean tensor(0.1718)
sparse
torch.Size([2])
none tensor([0.1948, 0.1489])
mean tensor(0.8592)
```