Move the loss function to GPU

Jindong · June 21, 2018, 2:36pm

Hi, every one,

I have a question about the “.cuda()”. In an example of Pytorch, I saw that there were the code like this:

criterion = nn.CrossEntropyLoss().cuda()

In my code, I don’t do this. So I am wondering if it necessary to move the loss function to the GPU.

Thanks

royboy · June 21, 2018, 7:54pm

If your input tensor is a cuda tensor, it will run the cuda loss function.

ptrblck · June 21, 2018, 8:48pm

Additionally to what @royboy said, you need to push your criterion to the GPU, if it’s stateful, i.e. if it has some parameters or internal states.
Usually loss functions are just functional so that it is not necessary.

Oscar_Rangel · January 13, 2019, 6:44pm

criterion = nn.CrossEntropyLoss().cuda() if torch.cuda.is_available() else nn.CrossEntropyLoss()

samra-irshad · July 31, 2020, 1:41am

@ptrblck Could you explain what do you mean by ‘if the criterion is stateful or if it has some parameters or some internal states’? I am not sure what does internal states mean

ptrblck · July 31, 2020, 7:28am

A weight parameter could be seen as an internal state and would yield a device mismatch error.
Of course you might define the weight parameter as a CUDATensor, but you could also move the criterion to the device:

output = torch.randn(10, 10, requires_grad=True, device='cuda')
target = torch.randint(0, 10, (10,), device='cuda')

weight = torch.empty(10).uniform_(0, 1)
criterion = nn.CrossEntropyLoss(weight=weight)

loss = criterion(output, target) # error
> RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'weight' in call to _thnn_nll_loss_forward

criterion.cuda()
loss = criterion(output, target) # works

samra-irshad · August 1, 2020, 8:12am

@ptrblck Right, I do not understand why did you give the weight parameter as an input to loss function in criterion = nn.CrossEntropyLoss(weight=weight) ? I have never seen anyone feeding loss function with weight.

ptrblck · August 1, 2020, 8:15am

The weight argument can be used to create a class weighting, as described in the docs of the criterion. It’s sometimes used to e.g. counter overfitting effects of training a model on imbalanced datasets.
Weighted loss functions are not new in deep learning and were already used in the “classical” machine learning domain.

samra-irshad · August 1, 2020, 8:32am

Right. So these are class weights. I am not sure if I correctly understand the meaning of “internal state”. How would you define it and does internal state varies from module to module? In source code for nn.Softmax(), I can see there are some lines on ‘state’, i.e.,

def __setstate__(self, state):
        self.__dict__.update(state)
        if not hasattr(self, 'dim'):
            self.dim = None

Is it the internal state you are refering towards? I also noticed the modules and functions inheriting nn.Module are mostly moved to CUDA, any thoughts?

ptrblck · August 2, 2020, 2:13am

By “internal states” I mean all class attributes in the modules.
In particular buffers and parameters are of interest, as they would need to be pushed to the appropriate device (e.g. such as the weight buffer in nn.CrossEntropyLoss).
The dim argument in nn.Softmax however will not be pushed to the device, as it’s a plain Python integer to specify the dimension the softmax is applied in.

Mohamed_Desouky · February 18, 2021, 11:03pm

trying this and got no error

input = torch.randn(10, 10, requires_grad=True, device='cuda')

criterion = nn.Softmax(dim = 1)

loss = criterion(input) # no error