Expected object of device type cuda but got device type cpu for argument #2 'target' in call to _thnn_nll_loss_forward

I’m trying to train with CrossEntropyLoss but my argmax method is failing with

RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 ‘target’ in call to _thnn_nll_loss_forward

at the loss = criterion(prediction, target_index) line. Can anyone point to my mistake?

    training_dataset_generator = torch.utils.data.DataLoader(traning_dataset, **params)
    criterion = torch.nn.CrossEntropyLoss(reduction='sum', weight=torch.Tensor([1.0, 10.0]))
    optimizer = torch.optim.SGD(model.parameters(), lr=ml_def.learning_rate)

    for epoch in range(max_epochs):
        print("doing epoch")
        epoch_loss = 0.0
        for batch, labels in training_dataset_generator:
            # transfer to gpu
            local_batch, local_labels = batch.to(ml_def.device), labels.to(ml_def.device)

            prediction = model(local_batch)

            local_labels_float = local_labels.float()

            values, indices = local_labels.max(0)

            target_index = Variable(torch.LongTensor([indices[0]]))

            loss = criterion(prediction, target_index)

            epoch_loss += loss

            #print("val", local_batch[0], "original", labels[0], "prediction", prediction.detach()[0], "loss", loss)

I am not entirely sure what this line is doing in this code here. local_labels_float is not being used at all.

Also, why not just transfer target_index to the gpu directly instead of sending labels and then doing max and indexing operations. Maybe that’s the problem.

for batch, labels in training_dataset_generator:
            values, indices = local_labels.max(0)
            target_index = indices[0]
            # transfer to gpu
            local_batch, target_index = batch.to(ml_def.device), target_index.to(ml_def.device)

            prediction = model(local_batch)
            loss = criterion(prediction, target_index)

Note that indices is already a Longtensor. By the looks of it, you may have to (although I am not entirely sure) use an unsqueeze operation on the target_index, i.e. target_index = target_index.unsqueeze(0) before transferring to the GPU.

1 Like