RuntimeError: Input type (unsigned char) and bias type (float) should be the same

The error message appears clear enough but I do not understand why I am getting it or how to fix it.
My code is as follows;

for epoch in range(hyper.epochs):
    epoch_loss = 0
    epoch_accuracy = 0
    
    for data, label in train_loader:
        data = data.to(gpu.device)
        label = label.to(gpu.device)
        
        output = model(data)

The error is from the last line above.
The model is defined as

import torch.nn as nn
# Input Layer: It represent input image data. 
# Conv Layer: This layer will extract features from image.
# Pooling Layer: This layer reduces the spatial volume of input image after convolution.
# Fully Connected Layer: It connect the network from a layer to another layer
# Output Layer: It is the predicted values layer. 
class Cnn(nn.Module):
    def __init__(self):
        super(Cnn, self).__init__()
        
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, padding=0, stride=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        
        self.layer2 = nn.Sequential(
            nn.Conv2d(16,32, kernel_size=3, padding=0, stride=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2)
            )
        
        self.layer3 = nn.Sequential(
            nn.Conv2d(32,64, kernel_size=3, padding=0, stride=2),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.fc1 = nn.Linear(3*3*64,10)
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(10,2)
        self.relu = nn.ReLU()
        
        
    def forward(self,x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = out.view(out.size(0),-1)
        out = self.relu(self.fc1(out))
        out = self.fc2(out)
        return out

the error is on the line; out = self.layer1(x)
The dataloader is reading in images;

import torch
from torch.utils.data import Dataset
from torchvision import transforms
from PIL import Image
from pathlib import Path

class dataset(Dataset):
    def __init__(self, image_paths, dict_classes, logging):
        self.image_paths = image_paths
        self.dict_classes = dict_classes
        self.logging = logging
        
    #dataset length
    def __len__(self):
        return len(self.image_paths)
  
    #load an one of images
    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        img = Image.open(img_path)
        transform = transforms.Compose([
            transforms.PILToTensor(),
            transforms.Resize((256, 256)),
            transforms.RandomResizedCrop(256)
        ])
        img_tensor = transform(img)
        _key = Path(img_path).parts[3]  
        label = self.dict_classes[_key]
        return img_tensor, label

Based on the error message it seems as if the input tensor use the uint8 dtype while the model uses the expected float32 dtype. Note that PILToTensor will keep the same dtype of the input image which is most likely causing the issue. Use ToTensor() to normalize the input image and return it in float32 format, which should fix the error.

2 Likes

Also, make sure to either create the transformation object first before calling the function from the quoted answer, or you could use the functional API by doing import torchvision.transforms.functional as TF and then calling TF.to_tensor(image)

Hi,

I have the same error after I tried switching torchvision transforms totensor to using albumentations, before I had been using albumentations for augmentations and torchvision transforms for ToTensor which is the commented out code blow, that worked fine, but this doesn’t anymore and returns the error in the title.

class ImageDataset(Dataset):
    def __init__(self, paths, aug, size):
        self.paths = paths 
        # self.transformer = Transform()
        # self.mask_transform = Mask_Transform()
        self.aug = aug
        self.size = size
        
    
    def __getitem__(self, idx: int):
        image_path, mask_path = self.paths[idx]
        image = cv2.imread(image_path, -1)[:, :, :3]
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        mask = cv2.imread(mask_path, 0) 
        mask = np.where(mask > 0, 1.0, 0) #I originally called this last, switched to see if that was the error
        t = self.aug(image = image, mask = mask)
        image = t['image']
        mask = t['mask']
        
        # image = self.transformer(image) # only called torchvision totensor
        # mask = self.mask_transform(mask) #only called torchvision totensor
        return image, mask
    
    def __len__(self):
        return self.size    
    
    
train_transforms = A.Compose([
                A.HorizontalFlip(p=0.5),
                A.VerticalFlip(p=0.5),
                A.RandomRotate90(p=0.5),
                A.RandomBrightness(limit = 0.1, p = 0.5),  
                A.CLAHE(p=0.2),
                A.RandomBrightnessContrast(p=0.2),    
                A.RandomGamma(p=0.2),
                ToTensorV2()
                ])

Is there something that is wrong here?
Despite all examples doing the contrary, there is an excepted answer at stackoverflow that said to switch ToTensorV2() to the front, I tried that and still get the same error in the title.

Many thanks in advance, as to why I am trying to switch to albumentations, I want to make it more uniform, and 2nd I also am still debugging the eval train thing :sweat_smile:

Based on the source code of ToTensorV2 I guess this error is expected since no dtype transformations are applied to the input tensors (unless I miss them). ToTensorV2 seems to only transpose the image if needed and convert it to a tensor.

1 Like

Thank you! Hmm, does that mean that I should call .float() on the images after?

I’m not deeply familiar with albumentations so don’t know what their standard workflow is, but using your code I get an image output in torch.uint8 with values in [0, 255], so you might want to normalize the tensor additionally.

1 Like

I see, thank you very much! The docs said the old ToTensor does divide by 255.0 and I was wondering if they just didn’t write it for brevity. Thanks again!

I struggled with this problem and this discussion got me almost all the way to the solution (thanks!). While digging through the Albumentations docs I found out that they have a “ToFloat()” transform that you can string onto the end of your A.Compose() transformations. Adding that line changed my tensors from dtype uint8 to float32 and resolved the type discrepancy that was preventing me from running my autoencoder. So for example, this works:

train_transforms = A.Compose(
    [
        A.Flip(p=0.1),
        A.GridDistortion(p=0.05),
        A.ToFloat(),
        ToTensorV2()
    ]
)

if you want to apply no transformations aside from converting your images to float tensors, this works for that purpose:

train_transforms = A.Compose(
    [
        A.ToFloat(),
        ToTensorV2()
    ]
)
1 Like