Cross entropy loss for 100+ targets [Expected floating point type/ "host_softmax" not implemented for 'Long']


Currently, I’m facing the issue with cross entropy loss. My target variable is one-hot encoding values such as [0,1,0,…,0] then I would have RuntimeError: Expected floating point type for target with class probabilities, got Long.

On the other hand, if i were to not perform one-hot encoding and input my target variable as is, then i face the issue of RuntimeError: “host_softmax” not implemented for ‘Long’

class BertClass(torch.nn.Module):
    def __init__(self):
        super(BertClass, self).__init__()
        self.model = BertModel.from_pretrained('bert-base-uncased')
        self.l1 = torch.nn.Linear(768, 512) # linear layer 1
        self.l2 = torch.nn.Linear(512, len(lb.classes_)) # num. of classes
        self.dropout = torch.nn.Dropout(0.2)

    def forward(self, input_ids, attention_mask, token_type_ids):
        output = self.model(input_ids=input_ids, 
        hidden_state = output[0]
        output = hidden_state[:, 0]
        output = self.l1(torch.nn.ReLU()(output))
        output = self.dropout(output)
        output = self.l2(output)
        return output

model = BertClass()
model =

class_weights = class_weight.compute_class_weight(class_weight = 'balanced', 
                                  classes = np.unique(available_df['controlId']), 
                                  y = available_df['controlId'])
class_weights = torch.tensor(class_weights)

loss_function = torch.nn.CrossEntropyLoss()(weight=class_weights)
optimizer = torch.optim.Adam(params =  model.parameters(), lr=LEARNING_RATE)

def train(epoch):
    tr_loss = 0
    n_correct = 0
    nb_tr_steps = 0
    nb_tr_examples = 0
    for _,data in enumerate(train_loader, 0):
        ids = data['ids'].to(device)
        mask = data['mask'].to(device)
        token_type_ids = data['token_type_ids'].to(device)
        targets = data['targets'].to(device)
        outputs = model(ids, mask, token_type_ids).to(device)
        outputs = outputs.long()
        print(outputs.size())# torch.Size([8, 325])
        print(targets.size()) # with one hot encoding torch.Size([8]). Without one-hot encoding torch.Size([8, 325])
        loss = loss_function(outputs, targets.long())
        tr_loss += loss.item()
        big_val, big_idx = torch.max(, dim=1)
        n_correct += calculate_accuracy(big_idx, targets)

        nb_tr_steps += 1
        if _%1000==0:
            loss_step = tr_loss/nb_tr_steps
            accu_step = (n_correct*100)/nb_tr_examples 
            print(f"Training Loss per 5000 steps: {loss_step}")
            print(f"Training Accuracy per 5000 steps: {accu_step}")

        # # When using GPU

    print(f'The Total Accuracy for Epoch {epoch}: {(n_correct*100)/nb_tr_examples}')
    epoch_loss = tr_loss/nb_tr_steps
    epoch_accu = (n_correct*100)/nb_tr_examples
    print(f"Training Loss Epoch: {epoch_loss}")
    print(f"Training Accuracy Epoch: {epoch_accu}")

    return epoch_accu 

for epoch in range(EPOCHS):

That’s expected as mentioned in the docs.
If you want to pass probabilities (in this case as a one-hot encoded tensor) the target has to be a floating point tensor.

I’m unsure why you want to apply a softmax on the target, which should contain class indices in case it’s a LongTensor, so could you explain your use case a bit more, please?

Thanks for your prompt response. I’m still new to this so thank you for being patient.
I’ve been reading how for Cross entropy loss, we shouldn’t convert to one-hot coding. In this case, would you recommend leave the 100+ labels as is and perform prediction? In addition, in the case of getting the indices, i assume we would select the index highest probabilities for each batch after model(ids, mask, token_type_ids) which makes the output length to be 8?

also, the reason why I’m having softmax via cross entropy here is because i want to get the probabilities for the classes then I would perform the loss calculation between my one-hot encoding ground truth the probability output from my prediction. But please correct me if my understanding is incorrect

In case your target is one-hot encoded you can create the class indices via torch.argmax, yes.
Don’t apply it on the model output, but the targets.

  1. nn.CrossEntropyLoss expects raw logits as the model output, so don’t use a softmax on it and pass the raw logits to the criterion.
  2. The error message indicates you are applying softmax on the target, which is why I asked for more details.

Thank you for all the insights. I’m not clear which line of my codes indicates that I apply softmax on my target. Can you help me point it out?

Indicates that you are are applying a softmax operation on a LongTensor, which should be the target unless you are explicitly casting the floating point outputs of your model to .long().