Is that possible to train the weights in CrossEntropyLoss?

deJQK · June 19, 2018, 7:28pm

I am trying to train the weight used in CrossEntropyLoss, and my code is something like this:

...
loss_weights = nn.Parameter(nn.Softmax()(torch.randn(n_classes, device=device, requires_grad=True)))
criterion = nn.CrossEntropyLoss(weight=loss_weights, size_average=False)
optimizer_ft = optim.SGD(list(mdl.parameters()) + list(loss_weights), lr=0.001, momentum=0.9)
...

But is report ‘can’t optimized non-leaf Tensor’ error.

How is that possible to do this?

richard · June 20, 2018, 4:19pm

The problem with this statement is that a leaf tensor is being created (torch.randn(..., requires_grad=True)) and then it is being hidden because nn.Softmax() returns a new tensor.

To make this work, try something like:

initial_weights = nn.Softmax()(torch.randn(n_classes, device=device)
loss_weights = torch.zeros_like(initial_weights, requires_grad=True)
# Don't record the following operation in autograd
with torch.no_grad():
    loss_weights.copy_(initial_weights)

and then proceed to optimize loss_weights

deJQK · June 20, 2018, 8:14pm

Hi @richard, thanks for your kind help. Finally I changed my code in some other way but got another error saying 'the derivative for 'weight' is not implemented', so it seems impossible to optimize the weights in CrossEntropyLoss function, and on the official website, it is said manual rescaling weight. Maybe I need to use some log function from torch to implement the cross entropy loss by myself.

GUPTA_NAMAN · May 7, 2021, 6:52pm

you cannot have them trainable. your weightage might go to negative infinity.

deJQK · May 7, 2021, 8:48pm

Hi Naman,

Thanks. The softmax operation prevents infinity.