What does it mean to move a loss function to device (GPU)?

loss = nn.CrossEntropyLoss().to(device)

It makes sense to me to move a tensor, a NN, but not a function
Finally, what if I made my own loss, why doesn’t it have the method .to() ?

4 Likes

In the context of nn Modules, the “to” method is explained in the documentation here.
The move happens to only parameters and buffers. Hence, moving a loss function like CrossEntropy to GPU doesn’t change anything. In a custom Loss function made subclassing nn.Module, the “.to()” will be inherited and will move any parameters/buffers to the gpu.
The custom loss function you made may not have subclassed nn.Module and hence, did not have the “.to()” method.

5 Likes

Which one is faster implementation ?
Having criterion = nn.CrossEntropyLoss()

but calling loss using predications (gpu tensor) and target (gpu tensor),

  1. criterion(predications, target)
  2. criterion(predications.cpu(), target.cpu())

Which of the above 2 is faster or both perform the same speed ?

1 Like

If the model predictions and targets are already on the GPU, I would keep them there, since the cpu() operation would synchronize the code, and might lower the performance.

2 Likes

If I was implementing my own loss function, does it make sense to cast all of my intermediate tensors to the GPU? It would make sense to me, but I’m not totally sure what CrossEntropyLoss does. Does CrossEntropyLoss do all of it’s work on the GPU? Or does it do all calculations on the CPU?

It depends on the inputs you are passing to this loss function. I.e. if the model output and targets are on the GPU, the computation will also be performed on the GPU. If your custom loss function has internal states stored as tensors you should move it to the same device before calculating the loss. If it’s stateless you can just pass the inputs to it to calculate the loss.