I am taking the output of a linear layer and resizing it, then I do run CrossEntropyLoss on it. On that line it dies with a type mismatch of:
TypeError: CudaSpatialClassNLLCriterion_updateOutput received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, torch.cuda.FloatTensor, torch.cuda.FloatTensor, bool, NoneType, torch.cuda.FloatTensor), but expected (int state, torch.cuda.FloatTensor input, torch.cuda.LongTensor target, torch.cuda.FloatTensor output, bool sizeAverage, [torch.cuda.FloatTensor weights or None], torch.cuda.FloatTensor total_weight)
the difference seems to be the third argument, it expects torch.cuda.LongTensor target but got torch.cuda.FloatTensor
I have printed out all my types and everything seems to be a FloatTensor. I dont understand where the Long requirement is coming from.
This is the relevant code for training:
criterion = nn.CrossEntropyLoss()
for epoch in range(args.num_epochs):
for i, (images, captions, lengths) in enumerate(data_loader):
decoder.zero_grad()
encoder.zero_grad()
images = Variable(images,volatile=False)
features = encoder(images)
outputs = decoder(features, captions, lengths)
loss = criterion(outputs, images)
and in decoder:
def __init__(self, embed_size, hidden_size, vocab_size, num_layers):
super(DecoderRNN, self).__init__()
self.embed = nn.Embedding(vocab_size, embed_size)
self.lstm = nn.LSTM(embed_size, hidden_size, num_layers, batch_first=True)
self.linear = nn.Linear(hidden_size, vocab_size)
self.linear_two = nn.Linear(4800)
self.init_weights()
def forward(self, features, captions, lengths):
embeddings = self.embed(captions)
embeddings = torch.cat((features.unsqueeze(1), embeddings), 1)
packed = pack_padded_sequence(embeddings, lengths, batch_first=True)
hiddens, _ = self.lstm(packed)
outputs = self.linear(hiddens[0])
outputs = linear.linear_two(outputs)
outputs = outputs.view(outputs.size(0),3,40,40)
print("output rnn type"+str(type(outputs.data)))
return outputs
If I remove these 2 lines,I can get the code running again.
outputs = linear.linear_two(outputs)
outputs = outputs.view(outputs.size(0),3,40,40)
Can anyone give me some insights into what Im doing wrong here?