Different results on CUDA and CPU

AresEkb · December 16, 2022, 8:13am

Here is a sample notebook: MNIST using PyTorch CNN 32,16 | Kaggle
It’s a MNIST digit recognizer based on CNN.
It works good enough on CUDA locally. But it seems that it gives worse results on CPU (either locally or on the Kagle server).

You can see results of last code block in the notebook. Predicted number for several first images is 5. Also you can see full results: MNIST using PyTorch CNN 32,16 | Kaggle The prediction for first 32 images is 5. There are more differences in results between cuda and cpu, but results for first 32 images looks very weird.

Maybe something is wrong with my model and it can be fixed somehow? Or it’s ok, just a difference between devices?

ptrblck · December 16, 2022, 8:24am

Numerical differences between devices are expected and correspond to the used dtype. E.g. for float32 you would expect to see a relative error of ~1e-6 (this also depends on the used operations etc., but might be a good first estimate). While some predictions could change your description might point to a valid issue, since it seems that all 32 predictions are returning the same class on the CPU, which differs from the GPU predictions.
I also assume that you are using the same pretrained model and data for this comparison. If so, could you calculate the training and validation accuracy on both devices and check the gap between these?

AresEkb · December 16, 2022, 8:48am

Yes, I use the same input data. And don’t use pretrained model, it’s trained in the notebook. I split the input data to training (90%) and validation sets (10%).

The results are very close:

CUDA:
train_accuracy=0.985
train_loss=0.0519
validation_accuracy=0.981
validation_loss=0.057
kaggle_accuracy=0.97978

CPU:
train_accuracy=0.986
train_loss=0.0474
validation_accuracy=0.981
validation_loss=0.0609
kaggle_accuracy=0.97771

kaggle_accuracy is calculated for file containing prediction ‘5’ for first 32 images on CPU. kaggle_accuracy is not very different for CPU and CUDA. But anyway first predictions are very weird.

ptrblck · December 16, 2022, 8:51am

What’s the difference between train_accuracy and kaggle_accuracy?

AresEkb · December 16, 2022, 8:56am

train_accuracy is calculated on 90% of input data with known classes used for model training (the calculation is done in my notebook)

validation_accuracy is calculated on 10% of input data with known classes used for model validation (the calculation is done in my notebook)

kaggle_accuracy is calculated on data with unknown classes (the calculation is done by Kaggle server, because I don’t know right classes)

AresEkb · December 16, 2022, 9:06am

It seems that the problem is related to prediction (not to model training):

I trained the model on CUDA
Saved the model to file
Loaded it on CPU
And I see the same problem: class ‘5’ for first images

AresEkb · December 16, 2022, 9:15am

I think I’ve found a bug in my notebook. I guess it’s not related to model itself, but to transfer of tensors between devices. I’ll try to locate it and clarify the question.

AresEkb · December 16, 2022, 10:09am

Sorry for spam. The problem was caused by inplace modification of input tensor.

The forward() method of my model contained line: x /= 255

It works fine after I replaced it by: x = x / 255

Here is the simplified version of my original notebook:

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

# device = torch.device('cuda')
device = torch.device('cpu')

torch.manual_seed(42)

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, dtype=torch.float32)
        self.conv2 = nn.Conv2d(32, 16, kernel_size=3, dtype=torch.float32)
        self.linear1 = nn.Linear(400, 10, dtype=torch.float32)

    def forward(self, x):
        # x = x / 255
        x /= 255 # The problem was here
        x = (F.max_pool2d(self.conv1(x), 2))
        x = (F.max_pool2d(self.conv2(x), 2))
        x = x.view(-1, 400)
        x = F.log_softmax(self.linear1(x), dim=1)
        return x

model = Model().to(device)

X_test = torch.rand(20, 1, 28, 28, dtype=torch.float32) * 255
test_dataset = torch.utils.data.TensorDataset(X_test)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=100)

# The following code calls Model::forward() method
# and modifies X_test in the following line
# x /= 255
with torch.no_grad():
    for i in range(0, 10):
        model(X_test[i].to(device))

with torch.no_grad():
    for [data] in test_loader:
        data = data.to(device)
        output = model(data)
        print(output.data.max(1)[1])

If you ran it on CPU, first 10 results will be always the same.

When I ran the model on CUDA, the input tensor was modified on CUDA. And the original input tensor on CPU stayed unmodified. But when I ran the model on CPU, the forward() method modified the original input tensor.

I think it’s a newbie problem Maybe it will be helpful for someone. It’s better to avoid inplace tensor modifications.

ptrblck · December 16, 2022, 10:27am

Great debugging and good to hear you’ve isolated the issue. Thanks for sharing the root cause!