Run_backward: expected dtype Float but got dtype Long

Trying to run the Variable._execution_engine.run_backward call and I am getting an error:
RuntimeError: expected dtype Float but got dtype Long. Even though both tensors are float32, I have attached a picture below showing the output from the debugger along with the error.

We are using the torch.nn.MSELoss function and we are getting similar results with the torch.nn.SmoothL1Loss function. Any ideas on what the problem is or what next steps should be?
We are running pytorch version 1.4.0.dev20191111.

Could you run with anomaly mode enabled to see which forward function creates this error please?

So I reran using the following code and still received same error (no changes).

Should I be running a different way? Also I am leveraging fastai (ver 1.0.59) just in case that matters.

Here is the full error stack from the anomaly detect run.

(can only upload one screen capture at a time)

You should get two stack traces. One with a function in the forward and one with the backward call.

When I run this with detect anomaly on, I get the same error stack trace. I dont have a NaN value problem. My tensors and loss are showing as follows:

We can run the forward function independently entirely and there are no problems as is shown from the output of the debug screen capture above.

@albanD Just wanted to let you know that this had to do with the shape of the tensors. We were able to fix the problem by using the MSELossFlat loss function provided in the fastai framework. This essentially wraps the torch.nn.MSELoss function in a flattener.
Below is a link to the fastai version of the loss function.

This an interesting bug because we have not encountered it before in our classification problems. We freely interchange pytorch and fastai loss functions, however, in this case it was a problem. We were working the rossmann sales problem.

Just to clarify the solution:
you were able to avoid the type mismatch by using the FastAI MSELossFlat method, but are still seeing it using plain PyTorch code?


Is there any other way to solve this problem?. I am facing the same problem too.

Are you seeing this issue using FastAI as well or plain PyTorch?

I solved that issue by converting my tensors to float using .float(). This worked.