I have a LSTM model where I first have to cast the data in float-tensor because pre-processed data is long. And since the problem is a classification one so I use cross-entropy loss function. But cross-entropy does not take float-tensor so I once again need to cast in long-tensor. But then I have

log_softmax_forward is not implemented for type torch.LongTensor

Why is it so difficult. Why does the framework not convert to requisite type since all the three types are same except in how much memory they are stored?

Code:

output = train_model(Variable(x.float())) # train_model is LSTM and LL model
# Expected object of type Variable[torch.FloatTensor] but
# found type Variable[torch.DoubleTensor] for argument #1 'mat1'. So has to cast to float.
loss = loss_func(output.long(), Variable(y)) # Loss function is cross-entropy loss function.
# Expected object of type Variable[torch.LongTensor] but found type
# Variable[torch.DoubleTensor] for argument #1 'target'. So I cast to long.
# log_softmax_forward is not implemented for type torch.LongTensor

How do I fix this? I don’t like this converting part. Moreover, even after that, I get an error. Where can I read more about it to understand well.

When there’s a type-mismatch like this, there might be some misunderstanding around what the functions do. For example, there is a reason why CrossEntropyLoss(input, target) takes LongTensor targets and FloatTensor inputs

I would still like to see if there is a solution to this conversion problem. It is somewhat annoying. Is it going to be implemented to do the conversion automatically? Or there is a reason not to do it?

The reason not to do it is that sometimes there is a misunderstanding around what the functions do. Let’s say I don’t know what the indexing operation does: tensor[scalar]. If I pass in the float-scalar (0.1), and it gets converted to an int-scalar (that is now 0), this operation will work and return something. However, maybe I assumed the [] operation does addition (ie, tensor[scalar] = tensor + scalar. Then my code would run and I would have a hard time tracking down what’s wrong.

Usually there’s a good reason behind some functions only taking LongTensors and some functions only taking floating-point types. If you see something that you think shouldn’t have these restrictions, please let us know with a forum post or an issue.

I’m all for throwing an error when the types don’t logically work for the given functions. But for example half/float/double automatic conversion as an option would be nice. Same for long/byte/short/int. These are only different types because of efficiency, and there’s no reason cross_entropy shouldn’t work with a byte tensor.

It’s also super frustrating to work in 16 bit precision, because so many of the built in functions either don’t work, or are slow using half precision floats. The slowness can’t even be easily fixed using casting, because then the gradients are the wrong types on the backward pass.