Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

Hi everyone,
I’m training a model using PyTorch and while running the train function I encounter the following error message:

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

During the run, I noticed that after the first epoch my tensor changes its working device to CPU from the GPU, as can be seen here:

I would really appreciate your help,
thank you in advance

Check if the data was properly moved to the GPU as this error indicates a device mismatch in the model execution while the parameters of the model seem to be on the GPU already.

1 Like

At each training iteration I’m moving both the data and the model to the GPU as in the attached code:

def train(num_epochs, model, optimizer, loss_fn, train_loader):
best_accuracy = 0.0
# Define your execution device
device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
print(“The model will be running on”, device, “device”)
# Convert model parameters and buffers to CPU or Cuda

for epoch in range(num_epochs):  # loop over the dataset multiple times
    running_loss = 0.0
    running_acc = 0.0

    for i, (images, labels) in enumerate(tqdm(train_loader, 0)):

        model = model.to(torch.device('cuda'))
        # get the inputs
        images = images.to(torch.device('cuda'))
        labels = labels.to(torch.device('cuda'))

        # zero the parameter gradients
        optimizer.zero_grad()
        # predict classes using images from the training set
        outputs = model(images)
        # compute the loss based on model output and real labels
        loss = loss_fn(outputs, torch.max(labels, 1)[1])
        # backpropagate the loss
        loss.backward()
        # adjust parameters based on the calculated gradients
        optimizer.step()

        # Let's print statistics for every 1,000 images
        running_loss += loss.item()  # extract the loss value
        if i % 10 == 0:
            # print every 1000 (twice per epoch)
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 1000))
            # zero the loss
            running_loss = 0.0
        sleep(0.1)
    # Compute and print the average accuracy fo this epoch when tested over all 10000 test images
    accuracy = test_accuracy(model, train_loader)
    print('For epoch', epoch + 1, 'the test accuracy over the whole test set is %d %%' % (accuracy))

    # we want to save the model if the accuracy is the best
    if accuracy > best_accuracy:
        save_model()
        best_accuracy = accuracy

Or should I do it before? while creating the Dataset ?

Your code looks correct and you could remove the:

model = model.to(torch.device('cuda'))

from the DataLoader loop as the model should be moved once to the device before the training starts.

Are you creating any tensors in the forward method without moving them to the GPU and could you also check the validation or test loop and make sure the data is also moved to the GPU there?
If you get stuck, could you post a minimal, executable code snippet reproducing the issue, please?

1 Like

If I understand you correctly as I create the Dataloader should I immediately move it to the GPU?

train_set = Dataset.CtDataset(x_train, y_train, transform=train_transform, kind=‘train’)
val_set = Dataset.CtDataset(x_test, y_test, transform=val_transform, kind=‘val’)

print(‘Train size: {}’.format(len(train_set)))
print(‘Test size: {}’.format(len(val_set)))

train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
valid_loader = DataLoader(val_set, batch_size=64, shuffle=True)

I will try to see if it works during test or validation, the line in which I get the error is in bold and occurs right after the first epoch is about to end. I tried moving my input within the forward method to the GPU but there was no effect… I’m attaching my forward method with the whole model class:

class Model(nn.ModuleList):

def __init__(self):
    super(Model, self).__init__()
    self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=5, stride=1, padding=1)
    self.bn1 = nn.BatchNorm2d(12)
    self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=5, stride=1, padding=1)
    self.bn2 = nn.BatchNorm2d(12)
    self.pool = nn.MaxPool2d(2, 2)
    self.conv4 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=5, stride=1, padding=1)
    self.bn4 = nn.BatchNorm2d(24)
    self.conv5 = nn.Conv2d(in_channels=24, out_channels=24, kernel_size=5, stride=1, padding=1)
    self.bn5 = nn.BatchNorm2d(24)
    self.fc1 = nn.Linear(80736, 64)

def forward(self, x):
    print(x.device)
    **x = F.relu(self.bn1(self.conv1(x)))**
    x = F.relu(self.bn2(self.conv2(x)))
    x = self.pool(x)
    x = F.relu(self.bn4(self.conv4(x)))
    x = F.relu(self.bn5(self.conv5(x)))
    x = x.view(x.size(0), -1)
    x = self.fc1(x)
    return x

Thank you so much for helping me!

Based on the location of the error the input doesn’t seem to be moved to the device and the forward method looks aright.
Check if:

images = images.to(torch.device('cuda'))
labels = labels.to(torch.device('cuda'))

is used in all DataLoader loops (training, validation, test, etc.).

I think I managed to find the problem, while sending the Dataloader to the accuracy function I did not change the device which may cause the crash. checking it now, will update shortly.
thank you !

I managed to find what the problem is - as I mentioned in another comment of mine the problem came from the Dataloader - I didn’t move it onto the GPU in the accuracy function. After doing so I’m now getting the following Error:

RuntimeError: The size of tensor a (64) must match the size of tensor b (7) at non-singleton dimension 1

The function:

def test_accuracy(model, test_loader):
model.eval()
acc = 0.0
total = 0.0

with torch.no_grad():
    for data in test_loader:
        images, labels = data
        images = images.to(torch.device('cuda'))
        labels = labels.to(torch.device('cuda'))
        # run the model on the test set to predict labels
        outputs = model(images)
        # the label with the highest energy will be our prediction
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        acc += (predicted == labels).sum().item()

# compute the accuracy over all test images
acc = (100 * acc / total)
return (acc)

I guess the error is raised in the accuracy calculation:

predicted = torch.randint(0, 10, (2, 64))
labels = torch.randint(0, 10, (2, 7))
(predicted == labels).sum().item()
# RuntimeError: The size of tensor a (64) must match the size of tensor b (7) at non-singleton dimension 1

so check the shapes of these tensors and make sure you can compare them.

I have the same error and I don’t know what to do , this is my code:
#prediction function
def prediction(img_path,transformer):

image=Image.open(img_path)

image_tensor=transformer(image).float()



image_tensor=image_tensor.unsqueeze_(0)

if torch.cuda.is_available():
    image_tensor.cuda()
    
input=Variable(image_tensor)


output=model(input)

index=output.data.numpy().argmax()

pred=classes[index]


return pred

images_path=glob.glob(pred_path+‘/*.jpg’)
pred_dict={}
for i in images_path:
pred_dict[i[i.rfind(‘/’)+1:]]=prediction(i,transformer)

This code:

if torch.cuda.is_available():
    image_tensor.cuda()

won’t work as you need to reassign the image_tensor via:

if torch.cuda.is_available():
    image_tensor = image_tensor.cuda()

I also encounter the same error :
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

My code is here :

history += fit(EPOCHS, LR, model, train_dl, val_dl, torch.optim.Adam)

def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
    history = []
    optimizer = opt_func(model.parameters(), lr)
    for epoch in range(epochs):
        # Training Phase 
        model.train()
        train_losses = []
        for batch in train_loader:
            loss = model.training_step(batch)
            train_losses.append(loss)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        # Validation phase
        result = evaluate(model, val_loader)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        model.epoch_end(epoch, result)
        history.append(result)
    return history

@ptrblck could you please help me on this ?
It fails here :
result = evaluate(model, val_loader)
looks like there is something wrong with val_loader

I managed to resolve this issue. I had missed to load val_data to the GPU. and hence it was throwing the error

Hi @ptrblck. Could you help me with these issue?

I’ve moved data, model and loss func to cuda, but it seems the input tensor is not switching to cuda

The model is a CNN with a embedded quantum circuit.

RuntimeError : Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

train_data = torchvision.datasets.ImageFolder(path_to_train', transform=transforms.Compose([transforms.ToTensor()]))
test_data = torchvision.datasets.ImageFolder('path_to_test', transform=transforms.Compose([transforms.ToTensor()]))

train_loader = DataLoader(train_data, shuffle=True, batch_size=1)
test_loader = DataLoader(test_data, shuffle=True, batch_size=1)

for data, target in train_loader:
    data = data.to(device)
    target = target.to(device)

for data, target in test_loader:
    data = data.to(device)
    target = target.to(device)

class Net(Module):
    def __init__(self, qnn):
        super().__init__()
        self.conv1 = Conv2d(3, 1, kernel_size=5)
        self.conv2 = Conv2d(1, 1, kernel_size=5)
        self.dropout = Dropout2d()
        self.fc1 = Linear(3844, 64)
        self.fc2 = Linear(64, 2)  # 2-dimensional input to QNN
        self.qnn = TorchConnector(qnn) # Apply torch connector, weights chosen
        # uniformly at random from interval [-1,1].
        self.fc3 = Linear(1, 1)  # 1-dimensional output from QNN

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2)
        x = self.dropout(x)
        x = x.view(x.shape[0], -1))
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        x = self.qnn(x)  # apply QNN
        x = self.fc3(x)
        return cat((x, 1 - x), -1)


model = Net(qnn)

model = model.to('cuda')

# Define model, optimizer, and loss function
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_func = NLLLoss().to('cuda')

# Start training
epochs = 10  # Set number of epochs
loss_list = []  # Store loss history
model.train()  # Set model to training mode

for epoch in range(epochs):
    total_loss = []
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad(set_to_none=True)  # Initialize gradient
        print(data.type, target.type)
        output = model(data)  # Forward pass
        loss = loss_func(output, target)  # Calculate loss
        loss.backward()  # Backward pass
        optimizer.step()  # Optimize weights
        total_loss.append(loss.item())  # Store loss
    loss_list.append(sum(total_loss) / len(total_loss))
    print("Training [{:.0f}%]\tLoss: {:.4f}".format(100.0 * (epoch + 1) / epochs, loss_list[-1]))

Sorry,lacks this

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

You’ve moved the temp. data in a dummy loop to the GPU:

for data, target in train_loader:
    data = data.to(device)
    target = target.to(device)

but not the actually used data in the training loop:

for epoch in range(epochs):
    total_loss = []
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad(set_to_none=True)  # Initialize gradient
        print(data.type, target.type)
        output = model(data)  # Forward pass
        ...

Add data = data.to(device) inside the training loop and it should work.

1 Like

@ptrblck Yes! I figured it out! Thanks anyway!!!
the input tensor has shape [3, 260, 260] standing for the RGB channels, height and width of the images.

Do you consider that I can change something into the CNN to improve the accuracy? Any rapid insight that you can share?

Looking to hearing from you.

Best,

I’m not familiar with your use case so cannot comment on the model architecture. However, since you are using nn.NLLLoss make sure to pass log probabilities to this loss function via F.log_softmax.

1 Like

Ive seen this problem on porting code from from CUDA to MPS and solved it as below;
For the sake of other coders who may be less familiar with various platforms, I think a generic answer for this would be, aside adding images and labels to device, also add your model in same way. Example below;

images=images.to(devc)
labels=labels.to(devc)
model = model.to(devc)

before doing;
outputs = model(images)

I was getting the same mistake even though I was assigned a device to images, labels, and outputs, later I found that I doing--> images.to(device) instead of doing --> images = images.to(device)
in short, don’t forget to re-assigned data and model to the original variable