Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

Hi everyone,
I’m training a model using PyTorch and while running the train function I encounter the following error message:

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

During the run, I noticed that after the first epoch my tensor changes its working device to CPU from the GPU, as can be seen here:

I would really appreciate your help,
thank you in advance

Check if the data was properly moved to the GPU as this error indicates a device mismatch in the model execution while the parameters of the model seem to be on the GPU already.

At each training iteration I’m moving both the data and the model to the GPU as in the attached code:

def train(num_epochs, model, optimizer, loss_fn, train_loader):
best_accuracy = 0.0
# Define your execution device
device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
print(“The model will be running on”, device, “device”)
# Convert model parameters and buffers to CPU or Cuda

for epoch in range(num_epochs):  # loop over the dataset multiple times
    running_loss = 0.0
    running_acc = 0.0

    for i, (images, labels) in enumerate(tqdm(train_loader, 0)):

        model ='cuda'))
        # get the inputs
        images ='cuda'))
        labels ='cuda'))

        # zero the parameter gradients
        # predict classes using images from the training set
        outputs = model(images)
        # compute the loss based on model output and real labels
        loss = loss_fn(outputs, torch.max(labels, 1)[1])
        # backpropagate the loss
        # adjust parameters based on the calculated gradients

        # Let's print statistics for every 1,000 images
        running_loss += loss.item()  # extract the loss value
        if i % 10 == 0:
            # print every 1000 (twice per epoch)
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 1000))
            # zero the loss
            running_loss = 0.0
    # Compute and print the average accuracy fo this epoch when tested over all 10000 test images
    accuracy = test_accuracy(model, train_loader)
    print('For epoch', epoch + 1, 'the test accuracy over the whole test set is %d %%' % (accuracy))

    # we want to save the model if the accuracy is the best
    if accuracy > best_accuracy:
        best_accuracy = accuracy

Or should I do it before? while creating the Dataset ?

Your code looks correct and you could remove the:

model ='cuda'))

from the DataLoader loop as the model should be moved once to the device before the training starts.

Are you creating any tensors in the forward method without moving them to the GPU and could you also check the validation or test loop and make sure the data is also moved to the GPU there?
If you get stuck, could you post a minimal, executable code snippet reproducing the issue, please?

If I understand you correctly as I create the Dataloader should I immediately move it to the GPU?

train_set = Dataset.CtDataset(x_train, y_train, transform=train_transform, kind=‘train’)
val_set = Dataset.CtDataset(x_test, y_test, transform=val_transform, kind=‘val’)

print(‘Train size: {}’.format(len(train_set)))
print(‘Test size: {}’.format(len(val_set)))

train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
valid_loader = DataLoader(val_set, batch_size=64, shuffle=True)

I will try to see if it works during test or validation, the line in which I get the error is in bold and occurs right after the first epoch is about to end. I tried moving my input within the forward method to the GPU but there was no effect… I’m attaching my forward method with the whole model class:

class Model(nn.ModuleList):

def __init__(self):
    super(Model, self).__init__()
    self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=5, stride=1, padding=1)
    self.bn1 = nn.BatchNorm2d(12)
    self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=5, stride=1, padding=1)
    self.bn2 = nn.BatchNorm2d(12)
    self.pool = nn.MaxPool2d(2, 2)
    self.conv4 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=5, stride=1, padding=1)
    self.bn4 = nn.BatchNorm2d(24)
    self.conv5 = nn.Conv2d(in_channels=24, out_channels=24, kernel_size=5, stride=1, padding=1)
    self.bn5 = nn.BatchNorm2d(24)
    self.fc1 = nn.Linear(80736, 64)

def forward(self, x):
    **x = F.relu(self.bn1(self.conv1(x)))**
    x = F.relu(self.bn2(self.conv2(x)))
    x = self.pool(x)
    x = F.relu(self.bn4(self.conv4(x)))
    x = F.relu(self.bn5(self.conv5(x)))
    x = x.view(x.size(0), -1)
    x = self.fc1(x)
    return x

Thank you so much for helping me!

Based on the location of the error the input doesn’t seem to be moved to the device and the forward method looks aright.
Check if:

images ='cuda'))
labels ='cuda'))

is used in all DataLoader loops (training, validation, test, etc.).

I think I managed to find the problem, while sending the Dataloader to the accuracy function I did not change the device which may cause the crash. checking it now, will update shortly.
thank you !

I managed to find what the problem is - as I mentioned in another comment of mine the problem came from the Dataloader - I didn’t move it onto the GPU in the accuracy function. After doing so I’m now getting the following Error:

RuntimeError: The size of tensor a (64) must match the size of tensor b (7) at non-singleton dimension 1

The function:

def test_accuracy(model, test_loader):
acc = 0.0
total = 0.0

with torch.no_grad():
    for data in test_loader:
        images, labels = data
        images ='cuda'))
        labels ='cuda'))
        # run the model on the test set to predict labels
        outputs = model(images)
        # the label with the highest energy will be our prediction
        _, predicted = torch.max(, 1)
        total += labels.size(0)
        acc += (predicted == labels).sum().item()

# compute the accuracy over all test images
acc = (100 * acc / total)
return (acc)

I guess the error is raised in the accuracy calculation:

predicted = torch.randint(0, 10, (2, 64))
labels = torch.randint(0, 10, (2, 7))
(predicted == labels).sum().item()
# RuntimeError: The size of tensor a (64) must match the size of tensor b (7) at non-singleton dimension 1

so check the shapes of these tensors and make sure you can compare them.