Switch loss function causes "RuntimeError: Found dtype Double but expected Float"

legendu · October 7, 2021, 9:48pm

I tried to fine tune the ResNe18 model on GPU. The training code runs OK. However, I decided to further train the produced model using a different loss function (switched from L1 loss to L2 loss) but got the the error below. No other code or data changed.

RuntimeError: Found dtype Double but expected Float

Any idea what might have caused the issue and how to fix it?

Below is the information of my environment.

OS:                     Ubuntu 20.04 in Docker
python                 3.8.10
torch                   1.9.1+cu111
torchaudio              0.9.1
torchvision             0.10.1+cu111

legendu · October 7, 2021, 10:01pm

It doesn’t work even if call loss = loss.to(torch.float32) to convert loss to torch.float32 before backward propagation.

isalirezag · October 7, 2021, 11:11pm

you shouldnt do it that way, you need to change the target/label to float, here is an example:

loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target.float())  # the floating can happen here 
output.backward()

legendu · October 7, 2021, 11:20pm

What I’m confused is why does l1_loss work while mse_loss doesn’t work?

isalirezag · October 7, 2021, 11:37pm

Im not sure, one reason can be probably because the backprop in the mse cannot handle double format since it is squared L2 norm, but for l1 it is not an issue.

ptrblck · October 8, 2021, 5:48am

It should not raise an issue in the backward, if the forward was successfully executed.
As mentioned in the other thread: could you post an executable code snippet which we could use to reproduce and debug this issue, please?

Joey_Huang · April 24, 2025, 8:46am

I meet the same issue when I relace nn.L1Loss with nn.MESLoss in AMP training environment.