Error when loss.backward() is called

I’m trying to train a simple CNN, but when i call loss.backward(), i get the following error:

RuntimeError: expected dtype Double but got dtype Float (validate_dtype at ..\aten\src\ATen\native\TensorIterator.cpp:143)

My code:

import torch
import torch.nn as nn
import torch.nn.functional as F 
import torch.optim as optim
import numpy as np
from tqdm import tqdm

class Net(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(1,32, 3) #48; after max pool -> 24
        self.conv2 = nn.Conv2d(32,64, 3) #22; after max pool -> 11
        self.conv3 = nn.Conv2d(64,128, 3) #9; after max pool -> 4.5 -> 4

        self.fc1 = nn.Linear(128*4*4, 512)
        self.fc2 = nn.Linear(512, 2)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)),(2,2))
        x = F.max_pool2d(F.relu(self.conv2(x)),(2,2))
        x = F.max_pool2d(F.relu(self.conv3(x)),(2,2))
        x = torch.flatten(x,start_dim=1,end_dim=-1) 
        #x = x.view(-1,2048)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.softmax(x,dim=1)

net = Net()
optimizer = optim.Adam(net.parameters(), lr=0.001)

loss_function = nn.MSELoss()

training_data = np.load("training_data.npy",allow_pickle=True)

X = torch.tensor([i[0] for i in training_data]).view(-1,50,50)# Images
X = X/255.0 #pixels 0 e 1
y = torch.tensor([i[1] for i in training_data])# Hot vectors; Labels

VAL_PCT = 0.1

val_size = int(len(X)*VAL_PCT)

train_X = X[:-val_size]
train_y = y[:-val_size]

test_X = X[-val_size:]
test_y = y[-val_size:]



for epoch in range(EPOCHS):
    for i in tqdm(range(0, len(train_X),BATCH_SIZE)):
        #print(i, i+BATCH_SIZE)
        batch_X = train_X[i:i+BATCH_SIZE].view(-1,1,50,50)
        batch_y = train_y[i:i+BATCH_SIZE]

        outputs = net(batch_X)
        loss = loss_function(outputs, batch_y)

The TraceBack Message :

Traceback (most recent call last):
  File "e:\Desenvolvimento\PyTorch\", line 63, in <module>
  File "C:\Users\erikj\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Users\erikj\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\autograd\", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: expected dtype Double but got dtype Float (validate_dtype at ..\aten\src\ATen\native\TensorIterator.cpp:143)
(no backtrace available)

Torch Version:


I’m not using Cuda, as far as i know, this is just for learning purposes.


After some research, i found a solution.
I put the dtype of the tensors in the variabel ‘y’ to dtype=torch.float32

I don’t know if it’s the best solution, anyway, any help will be welcomed


That’s the right solution since nn.MSELoss expects the same dtype.


Just curious, if it is possible to handle this issue originally in backward() of PyTorch. Is there any reason why it is safe to throw this exception?

Since the core issue is passing different dtypes to MSELoss, it seems to be that a TypeError should be thrown on loss = loss_function(outputs, batch_y) instead of a RuntimeError on loss.backward(). Why allow users to compute the loss tensor if it’s known to be invalid for backpropagation? Curious if the team has any thoughts there