The problem with this statement is that a leaf tensor is being created (torch.randn(..., requires_grad=True)) and then it is being hidden because nn.Softmax() returns a new tensor.

To make this work, try something like:

initial_weights = nn.Softmax()(torch.randn(n_classes, device=device)
loss_weights = torch.zeros_like(initial_weights, requires_grad=True)
# Don't record the following operation in autograd
with torch.no_grad():
loss_weights.copy_(initial_weights)

Hi @richard, thanks for your kind help. Finally I changed my code in some other way but got another error saying 'the derivative for 'weight' is not implemented', so it seems impossible to optimize the weights in CrossEntropyLoss function, and on the official website, it is said manual rescaling weight. Maybe I need to use some log function from torch to implement the cross entropy loss by myself.