What does it mean to move a loss function to device (GPU)?

loss = nn.CrossEntropyLoss().to(device)

It makes sense to me to move a tensor, a NN, but not a function
Finally, what if I made my own loss, why doesn’t it have the method .to() ?


In the context of nn Modules, the “to” method is explained in the documentation here.
The move happens to only parameters and buffers. Hence, moving a loss function like CrossEntropy to GPU doesn’t change anything. In a custom Loss function made subclassing nn.Module, the “.to()” will be inherited and will move any parameters/buffers to the gpu.
The custom loss function you made may not have subclassed nn.Module and hence, did not have the “.to()” method.


Which one is faster implementation ?
Having criterion = nn.CrossEntropyLoss()

but calling loss using predications (gpu tensor) and target (gpu tensor),

  1. criterion(predications, target)
  2. criterion(predications.cpu(), target.cpu())

Which of the above 2 is faster or both perform the same speed ?

1 Like

If the model predictions and targets are already on the GPU, I would keep them there, since the cpu() operation would synchronize the code, and might lower the performance.