RuntimeError: Expected tensor to have CPU Backend, but got tensor with CUDA Backend (while checking arguments for batch_norm_cpu)

Hi All, I got this error when I do inference on CPU. It works well on GPU but when I use the CPU it throws this error.

This is the model architecture

class BasicBlock(nn.Module):

    def __init__(self, in_channels, out_channels, stride, activation):
        super(BasicBlock, self).__init__()

        self.conv1 = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=3,
            stride=stride,  # downsample with first conv
            padding=1,
            bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(
            out_channels,
            out_channels,
            kernel_size=3,
            stride=1,
            padding=1,
            bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.activation = activation
        
        self.shortcut = nn.Sequential()
        if in_channels != out_channels:
            self.shortcut.add_module(
                'conv',
                nn.Conv2d(
                    in_channels,
                    out_channels,
                    kernel_size=1,
                    stride=stride,  # downsample
                    padding=0,
                    bias=False))
            self.shortcut.add_module('bn', nn.BatchNorm2d(out_channels))  # BN

    def forward(self, x):
        y = self.activation(self.bn1(self.conv1(x)))
        y = self.bn2(self.conv2(y))
        
        y += self.shortcut(x)
        y = self.activation(y)  # apply ReLU after addition
        return y
    
    
    
class DriverNet(nn.Module):

    def __init__(self):
        super(DriverNet, self).__init__()
        
        self.activation = nn.ReLU()

        self.conv_layers = nn.Sequential(
            nn.BatchNorm2d(3),
            BasicBlock(in_channels=3, out_channels=16, stride = 2, activation = self.activation),
            BasicBlock(in_channels=16, out_channels=24, stride = 2, activation = self.activation),
            BasicBlock(in_channels=24, out_channels=32, stride = 2, activation = self.activation),
            BasicBlock(in_channels=32, out_channels=48, stride = 2, activation = self.activation),
            BasicBlock(in_channels=48, out_channels=64, stride = 2, activation = self.activation),
            BasicBlock(in_channels=64, out_channels=128, stride = 2, activation = self.activation),
#             BasicBlock(in_channels=128, out_channels=256, stride = 2, activation = self.activation),
            nn.Dropout(p=0.5)
        )
        self.linear_layers = nn.Sequential(
            nn.Linear(in_features=128*4*5, out_features=128),
            self.activation,
#             nn.BatchNorm1d(128),
            nn.Dropout(p=0.5),
            nn.Linear(in_features=128, out_features=2),
        )
        

    def forward(self, x):
        output = self.conv_layers(x)
        output = output.view(output.size(0), -1)
        output = self.linear_layers(output)
        return output

This is the inference code:

    
device = torch.device("cpu") #torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model = DriverNet().to(device)
model = torch.load("model.pt")
model = model.eval()
image = cv2.resize(image, dim, interpolation = cv2.INTER_AREA) 
image_tensor = torch.from_numpy(image.transpose(2,0,1))
image_tensor = image_tensor.unsqueeze(0).float().to(device)
output = model(image_tensor)

The code snippet is unfortunately not executable and the common input shapes such as [batch_size, 3, 224, 224] and [batch_size, 3, 256, 256] yield shape mismatches, so I cannot really debug it.
However, based on the error message it seems that you are passing a GPU tensor to the model, while the model’s parameters are on the CPU. You can add debug print statements inside the forward methods and check the activation’s .device attribute to isolate the issue.

Thanks @ptrblck for your reply! You can debug it using this input torch.rand(1,3,218,291).to(device)
and, I will do what you mentioned to find the issue

Thanks for the shape information.
Your code snippet works fine in my setup:

device = torch.device("cuda")
model = DriverNet().to(device)
model = model.eval()

image_tensor = torch.randn(1,3,218,291).to(device)
output = model(image_tensor)
print(output)
> tensor([[ 0.0792, -0.0424]], device='cuda:0', grad_fn=<AddmmBackward>)

Could you remove the model = torch.load("model.pt") operation, as it could load any other stored model?

It works for me as well when I use device = torch.device("cuda") and the model works well! but when I use torch.device("cpu") It throws the error mentioned above.
and model = torch.load("model.pt") loads the right model that I want to test and works well in gpu.

I think I figured the issue.

device = torch.device("cpu") #torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model = DriverNet().to(device)
model = torch.load("model.pt")

I believe the saved model was on gpu. so It’s loaded on gpu. I convert the model to cpu before loading the model. so It gets back to gpu. and the input is on cpu
Now I do

device = torch.device("cpu") #torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model = DriverNet()
model = torch.load("model.pt")
model = model.to(device)

and it works well! Thanks @ptrblck for your help! I appreciate it