# How weights are being used in Cross Entropy Loss

Since I checked the doc and the explanation from weights in CE
But When I was checking it for more than two samples, it is showing different results as below
For below snippet

``````inp = tensor([[0.9860, 0.1934],
[0.9590, 0.3538],
[0.1502, 0.9544],
[0.7666, 0.0535],
[0.1600, 0.3133],
[0.1827, 0.8578],
[0.2727, 0.7105],
[0.3965, 0.0156]])

target = tensor([1, 1, 1, 0, 0, 0, 1, 1])

cl_wts = 1./torch.tensor([5., 3.])

loss = nn.CrossEntropyLoss()
loss_weighted = loss = nn.CrossEntropyLoss(weight = cl_wts)

l1 = loss(inp, target)
print(l1)---> tensor(0.7793)

l_wt = loss_weighted(inp, target)
print(l_wt) ---> tensor(0.7839)

When I was checking it manually as

logits = Softmax(inp)
:point_down:
logits = tensor([[0.6884, 0.3116],
[0.6469, 0.3531],
[0.3091, 0.6909],
[0.6711, 0.3289],
[0.4617, 0.5383],
[0.3374, 0.6626],
[0.3923, 0.6077],
[0.5941, 0.4059]])

manual_loss1 =  -(np.log(0.3116) + np.log(0.3531) + np.log(0.6909) + np.log(0.6711) + np.log(0.4617) + np.log(0.3374) + np.log(0.6077) + np.log(0.4059))
manual_loss = manual_loss/8  --->8 is because of mini batch size
print(manual_loss) ---> 0.7793355874570308, which is equivalent to l1

However for weighted
man_loss_weighted = -(np.log(0.3116)*0.2 + np.log(0.3531)*0.2 + np.log(0.6909)*0.2 + np.log(0.6711)*0.33 + np.log(0.4617)*0.33 + np.log(0.3374)*0.33 + np.log(0.6077)*0.2 + np.log(0.4059)*0.2)/(0.2+0.33)
man_loss_weighted /=8
print(man_loss_weighted)---> 0.3633250361678566
Which is not equivalent to l2 weighted loss,

How is it being computed. Any help would be appreciated
Thank you

``````

Hi Shakeel!

You have two errors in your computation of `man_loss_weighted`:

First you need to divide by the sum of the weights used for each
individual sample.*

Second, you have mixed up your class-0 and class-1 weights.

Here is the correct manual computation:

``````>>> -(np.log(0.3116)*0.333333 + np.log(0.3531)*0.333333 + np.log(0.6909)*0.333333 + np.log(0.6711)*0.2 + np.log(0.4617)*0.2 + np.log(0.3374)*0.2 + np.log(0.6077)*0.333333 + np.log(0.4059)*0.333333)/(0.333333 + 0.333333 + 0.333333 + 0.2 + 0.2 + 0.2 + 0.333333 + 0.333333)
0.784032260475451
``````

*) You have divided first by `cl_wts[0] + cl_wts[1]`. But you need
to divide by the actual weights used for each sample in the batch.
Suppose your batch contained only class-0 samples. In such a case
it wouldn’t wouldn’t make sense to use `cl_wts[1]` in the computation
at all. Then you divide by `8`. But suppose that all of your weights
were `1`. You would first divide by the sum of those weights, which
would be `8`, and then you would divide by `8` again, which would be
wrong.

Best.

K. Frank

Thank you @KFrank
but since the weight tensor is

``````cl_wts = 1./torch.tensor([5., 3.]) = tensor([0.2, 0.333])

``````

Why are you multiplying samples with label =1 by 0.333 (Doesn’t this mean we are giving it more importance , since it is already higher than other class. Shouldn’t we multiply the samples with lower class with more weights [in our case 0.333 is more and it should be multiplied to lower class and 0.2 for higher class] )

One more thing when i pass the weights

``````tensor([0.2, 0.333])

``````

does it work like the class_0 will get weight 0.2 and class_1 will get 0.333 and so on ??

Hi Shakeel!

I’m just using the weight tensor you specified.

Yes, this would be the typical approach. Note, however, I wouldn’t
reweight a batch of size 8 with the counts of the classes in that batch.
I would typically weight my classes based on the (approximate) class
counts in my whole training set (and I wouldn’t bother reweighting
unless the classes were much more imbalanced than 5-to-3).

Yes, that is correct. That is what I meant when I said that “you mixed
up your class-0 and class-1 weights.”

Best.

K. Frank

@KFrank
Thank you this makes sense now

@KFrank
How does weights effect in propagation so that we can be sure model is focusing on the minority class

How does weights effect in back propagation so that we can be sure model is focusing on the minority class
Does it boosts the gradient or the it increases the number of updates. Could you please clarify this

Hi Shakeel!

I suggest that you try a quick test.

Don’t use a model. Just create `pred` with `requires_grad = True`:

``````pred = torch.randn (10, 2, requires_grad = True)
``````

Then create some target class labels:

``````targ = torch.randint (2, (10, ))
``````

Then calculate and backpropagate `CrossEntropyLoss` with
various choices for `weight` and see what happens:

``````torch.nn.CrossEntropyLoss (weight = my_class_weights) (pred, targ).backward()