CrossEntropyLoss doesn't move the weight to same device

glenguo06 · February 19, 2025, 7:56am

It appears that for CrossEntropyLoss, .to() doesn’t copy the weight to the same device. Would be great to know is this by-design or a bug that is fixed on newer releases?

Minimum example from the usage doc:

>>> loss = nn.CrossEntropyLoss(weight=torch.FloatTensor([1, 2, 3])).cuda()
>>> input = torch.randn(3, 5, requires_grad=True).cuda()
>>> target = torch.empty(3, dtype=torch.long).random_(5).cuda()
>>> output = loss(input, target)

It will throw the following exception:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!

Looking at the _WeightedLoss module, weight parameter is registered into the buffer. I assume it should be copied to same device when .to() is called from the loss module?

KFrank · February 19, 2025, 3:11pm

Hi Glen!

It works for me on version 2.6.0:

>>> import torch
>>> print (torch.__version__)
2.6.0+cu126
>>> 
>>> ce = torch.nn.CrossEntropyLoss (weight = torch.ones (3))
>>> 
>>> ce.weight
tensor([1., 1., 1.])
>>> list (ce.buffers())
[tensor([1., 1., 1.])]
>>> _ = ce.cuda()
>>> ce.weight
tensor([1., 1., 1.], device='cuda:0')
>>> 
>>> input = torch.randn (3, device = 'cuda')
>>> target = torch.tensor (1, device = 'cuda')
>>> ce (input, target)
tensor(0.3248, device='cuda:0')

weight is a buffer of the Module CrossEntropyLoss, so it is expected to work.

Best.

K. Frank