Type mismatch on CrossEntropyLoss

deepcode · July 17, 2017, 7:39pm

I am taking the output of a linear layer and resizing it, then I do run CrossEntropyLoss on it. On that line it dies with a type mismatch of:

TypeError: CudaSpatialClassNLLCriterion_updateOutput received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, torch.cuda.FloatTensor, torch.cuda.FloatTensor, bool, NoneType, torch.cuda.FloatTensor), but expected (int state, torch.cuda.FloatTensor input, torch.cuda.LongTensor target, torch.cuda.FloatTensor output, bool sizeAverage, [torch.cuda.FloatTensor weights or None], torch.cuda.FloatTensor total_weight)

the difference seems to be the third argument, it expects torch.cuda.LongTensor target but got torch.cuda.FloatTensor

I have printed out all my types and everything seems to be a FloatTensor. I dont understand where the Long requirement is coming from.

This is the relevant code for training:

criterion = nn.CrossEntropyLoss()
for epoch in range(args.num_epochs):
  for i, (images, captions, lengths) in enumerate(data_loader):
    decoder.zero_grad()
    encoder.zero_grad()   
    images = Variable(images,volatile=False)
    features = encoder(images)
    outputs = decoder(features, captions, lengths)
    loss = criterion(outputs, images)

and in decoder:

 def __init__(self, embed_size, hidden_size, vocab_size, num_layers):
    super(DecoderRNN, self).__init__()
    self.embed = nn.Embedding(vocab_size, embed_size)
    self.lstm = nn.LSTM(embed_size, hidden_size, num_layers, batch_first=True)
    self.linear = nn.Linear(hidden_size, vocab_size)
    self.linear_two = nn.Linear(4800)
    self.init_weights()

   def forward(self, features, captions, lengths):
    embeddings = self.embed(captions)

    embeddings = torch.cat((features.unsqueeze(1), embeddings), 1)
    packed = pack_padded_sequence(embeddings, lengths, batch_first=True)
    hiddens, _ = self.lstm(packed)
    outputs = self.linear(hiddens[0])
    outputs = linear.linear_two(outputs)
    outputs = outputs.view(outputs.size(0),3,40,40)
    print("output rnn type"+str(type(outputs.data)))
    return outputs

If I remove these 2 lines,I can get the code running again.
outputs = linear.linear_two(outputs)
outputs = outputs.view(outputs.size(0),3,40,40)

Can anyone give me some insights into what Im doing wrong here?

deepcode · July 17, 2017, 8:31pm

I found this thread which has a similar issue:

I printed out the types for all my variables and everything is a float. I dont know why it expects a Long tensor all of a sudden. And I cant find any data that is of type Long. I am stuck here, any help would be appreciated it.

tom · July 17, 2017, 8:59pm

The instance of CrossEntropyLoss expects a long tensor with the target classes as second input. Quite possibly you are looking for a different loss function.

Best regards

Thomas

deepcode · July 17, 2017, 9:42pm

@tom Thanks for the help. That does seem to be the issue.
I had looked at the docs earlier. All of the loss functions dont have docs for the instance method. If those where there, I bet they would reduce a lot of confusion for other people