Why do we need to do loss.cuda() when we we have already done model.cuda()?

I am following a source code corresponding to a paper where authors have trained the model to do a segmentation task. In the code, authors wrote:

 if torch.cuda.is_available():

I don’t understand why the authors felt the need to send the loss functions to CUDA when the model has already been sent to the device? I have been using tensorflow in my past and I am new to pytorch, so I have trouble in understanding the device initialization for models and loss functions.


I do not know the implementation of those loss functions exactly (if they are modified), but if a criterion does not have any parameters, then sending to cuda makes no difference as there is no operation to change parameters in cuda.


What do you mean by ‘if the criterion does not have any parameters’? So far, I have only seen non-parameterized loss functions, i.e., loss functions which get some input tensors, apply loss function on those tensors and then give outputs.

Do you mean if the loss function being applied to tensors? Because I can imagine the loss function is being applied to CUDA tensors.

1 Like

Yes, exactly, parametrized loss functions are rare and that is why you cannot find loss.cuda anywhere.

Here, no need to use loss.cuda as its inputs all are cuda tensors.

Right. So as described in my original post, softmax function has also been moved to CUDA. I briefly reviewed the source code for nn.Softmax() and saw these lines:

class Softmax(Module):
    __constants__ = ['dim']
    dim: Optional[int]

    def __init__(self, dim: Optional[int] = None) -> None:
        super(Softmax, self).__init__()
        self.dim = dim

    def __setstate__(self, state):
        if not hasattr(self, 'dim'):
            self.dim = None

    def forward(self, input: Tensor) -> Tensor:
        return F.softmax(input, self.dim, _stacklevel=5)

    def extra_repr(self) -> str:
        return 'dim={dim}'.format(dim=self.dim)

I am not sure why def__setstate__ is used and what state is it setting? How does it receive the state? Also, I noticed the module/function that inherits nn.Module class usually gets to move to CUDA. Any thoughts?