Run_backward: expected dtype Float but got dtype Long

juperez999 · November 20, 2019, 7:35pm

Trying to run the Variable._execution_engine.run_backward call and I am getting an error:
RuntimeError: expected dtype Float but got dtype Long. Even though both tensors are float32, I have attached a picture below showing the output from the debugger along with the error.

We are using the torch.nn.MSELoss function and we are getting similar results with the torch.nn.SmoothL1Loss function. Any ideas on what the problem is or what next steps should be?
We are running pytorch version 1.4.0.dev20191111.

albanD · November 20, 2019, 8:33pm

Could you run with anomaly mode enabled to see which forward function creates this error please?

juperez999 · November 20, 2019, 8:47pm

So I reran using the following code and still received same error (no changes).

Should I be running a different way? Also I am leveraging fastai (ver 1.0.59) just in case that matters.

juperez999 · November 20, 2019, 8:48pm

Here is the full error stack from the anomaly detect run.

(can only upload one screen capture at a time)

albanD · November 20, 2019, 8:57pm

You should get two stack traces. One with a function in the forward and one with the backward call.

juperez999 · November 21, 2019, 12:20am

When I run this with detect anomaly on, I get the same error stack trace. I dont have a NaN value problem. My tensors and loss are showing as follows:

We can run the forward function independently entirely and there are no problems as is shown from the output of the debug screen capture above.

juperez999 · November 21, 2019, 12:48am

@albanD Just wanted to let you know that this had to do with the shape of the tensors. We were able to fix the problem by using the MSELossFlat loss function provided in the fastai framework. This essentially wraps the torch.nn.MSELoss function in a flattener.
Below is a link to the fastai version of the loss function.

github.com

fastai/fastai/blob/8013797e05f0ae0d771d60ecf7cf524da591503c/fastai/layers.py#L257


    return FlattenedLoss(nn.CrossEntropyLoss, *args, axis=axis, **kwargs)


def BCEWithLogitsFlat(*args, axis:int=-1, floatify:bool=True, **kwargs):
    "Same as `nn.BCEWithLogitsLoss`, but flattens input and target."
    return FlattenedLoss(nn.BCEWithLogitsLoss, *args, axis=axis, floatify=floatify, is_2d=False, **kwargs)


def BCEFlat(*args, axis:int=-1, floatify:bool=True, **kwargs):
    "Same as `nn.BCELoss`, but flattens input and target."
    return FlattenedLoss(nn.BCELoss, *args, axis=axis, floatify=floatify, is_2d=False, **kwargs)


def MSELossFlat(*args, axis:int=-1, floatify:bool=True, **kwargs):
    "Same as `nn.MSELoss`, but flattens input and target."
    return FlattenedLoss(nn.MSELoss, *args, axis=axis, floatify=floatify, is_2d=False, **kwargs)


class NoopLoss(Module):
    "Just returns the mean of the `output`."
    def forward(self, output, *args): return output.mean()


class WassersteinLoss(Module):
    "For WGAN."
    def forward(self, real, fake): return real.mean() - fake.mean()

This an interesting bug because we have not encountered it before in our classification problems. We freely interchange pytorch and fastai loss functions, however, in this case it was a problem. We were working the rossmann sales problem.

ptrblck · November 21, 2019, 4:29am

Just to clarify the solution:
you were able to avoid the type mismatch by using the FastAI MSELossFlat method, but are still seeing it using plain PyTorch code?

Phanikumar2all · April 8, 2020, 12:55pm

Hello,

Is there any other way to solve this problem?. I am facing the same problem too.

ptrblck · April 9, 2020, 12:30am

Are you seeing this issue using FastAI as well or plain PyTorch?

Phanikumar2all · April 9, 2020, 10:15am

I solved that issue by converting my tensors to float using .float(). This worked.