Extracting feature vector for grey images via ResNet18: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]

I have 600x800 images that have only 1 channel. I am trying to use pre-trained ResNet18 to extract their feature vectors, however the code expects 3 channel:

import torch
import torchvision
import torchvision.models as models
from PIL import Image

img = Image.open("labeled-data/train_moth/moth/frame163.png")


# Load the pretrained model
model = models.resnet18(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('avgpool')

# Set model to evaluation mode
model.eval()

transforms = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])


def get_vector(image):
    # Create a PyTorch tensor with the transformed image
    t_img = transforms(image)
    t_img = torch.cat((t_img, t_img, t_img), 0)
    # Create a vector of zeros that will hold our feature vector
    # The 'avgpool' layer has an output size of 512
    my_embedding = torch.zeros(512)

    # Define a function that will copy the output of a layer
    def copy_data(m, i, o):
        my_embedding.copy_(o.flatten())                 # <-- flatten

    # Attach that function to our selected layer
    h = layer.register_forward_hook(copy_data)
    # Run the model on our transformed image
    with torch.no_grad():                               # <-- no_grad context
        model(t_img.unsqueeze(0))                       # <-- unsqueeze
    # Detach our copy function from the layer
    h.remove()
    # Return the feature vector
    return my_embedding

Here’s the error I am getting:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-59ab45f8c1e6> in <module>
     42 
     43 
---> 44 pic_vector = get_vector(img)

<ipython-input-5-59ab45f8c1e6> in get_vector(image)
     21 def get_vector(image):
     22     # Create a PyTorch tensor with the transformed image
---> 23     t_img = transforms(image)
     24     t_img = torch.cat((t_img, t_img, t_img), 0)
     25     # Create a vector of zeros that will hold our feature vector

~/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py in __call__(self, img)
     59     def __call__(self, img):
     60         for t in self.transforms:
---> 61             img = t(img)
     62         return img
     63 

~/anaconda3/lib/python3.7/site-packages/torchvision/transforms/transforms.py in __call__(self, tensor)
    210             Tensor: Normalized Tensor image.
    211         """
--> 212         return F.normalize(tensor, self.mean, self.std, self.inplace)
    213 
    214     def __repr__(self):

~/anaconda3/lib/python3.7/site-packages/torchvision/transforms/functional.py in normalize(tensor, mean, std, inplace)
    296     if std.ndim == 1:
    297         std = std[:, None, None]
--> 298     tensor.sub_(mean).div_(std)
    299     return tensor
    300 

RuntimeError: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
    
    pic_vector = get_vector(img)



Code is from: https://stackoverflow.com/a/63552285/2414957

I thought using

t_img = torch.cat((t_img, t_img, t_img), 0)

would be helpful but I was wrong.

Here’s a bit about image:

$ identify frame163.png 
frame163.png PNG 800x600 800x600+0+0 8-bit Gray 256c 175297B 0.000u 0:00.000

This part of the error suggests the problem is in your use of torchvision.transforms.Normalize.

–> 212 return F.normalize(tensor, self.mean, self.std, self.inplace)

The docs for that name are here: https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Normalize.

That page describes its arguments as:

  • mean (sequence) – Sequence of means for each channel.
  • std (sequence) – Sequence of standard deviations for each channel.

In your code, you gave 3 means and 3 standard deviations to Normalize, which it will try to use for 3 different channels:
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

Since your inputs only have 1 channel, you should only be passing lists of 1 mean and 1 standard deviation like this:
torchvision.transforms.Normalize(mean=[0.485], std=[0.229])

Hello everybody, i have the same problem.
it return me this error.

File "/usr/local/lib/python3.10/site-packages/torchvision/transforms/functional.py", line 360, in normalize
    return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace)
  File "/usr/local/lib/python3.10/site-packages/torchvision/transforms/functional_tensor.py", line 940, in normalize
    return tensor.sub_(mean).div_(std)
RuntimeError: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]

i have to changed the transforms.compose

transforms.Compose([
                                        transforms.RandomHorizontalFlip(),
                                        transforms.RandomRotation(20),
                                        transforms.Resize(size=(224,224)),
                                        transforms.ToTensor(),
                                        transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
                                        ])

TO

my_transforms = transforms.Compose([
                                        transforms.RandomHorizontalFlip(),
                                        transforms.RandomRotation(20),
                                        transforms.Resize(size=(224,224)),
                                        transforms.ToTensor(),
                                        transforms.Normalize((0.5,), (0.5,))
                                        ])

But I have until the same error.

Please if someone can help

Your modified transformation should work for inputs with a single channel, so make sure that all inputs are using a single channel and that your modified transformation is actually used.

1 Like

Yes. Just like @ptrblck suggested, please make sure if your model (ResNet18) is configured to accept single channel images. See the following post on single channel ResNet18 if necessary. https://discuss.pytorch.org/t/altering-resnet18-for-single-channel-images/29198/12

1 Like

Thank you for you answers @ptrblck and @tiramisuNcustard.
But I don’t use ResNet model I have to use Net() model like this.

#DĂ©finir l'architecture du CNN
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # convolutional layer
        self.conv1 = nn.Conv2d(3, 16, 5)
        # max pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 5)
        #self.dropout = nn.Dropout(0.2)
        self.fc1 = nn.Linear(32, 256)
        self.fc2 = nn.Linear(256, 84)
        self.fc3 = nn.Linear(84, 2)
        self.softmax = nn.LogSoftmax(dim=1)

        
    def forward(self, x):
        # add sequence of convolutional and max pooling layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        
        #x = self.dropout(x)
        N, C, H, W = x.shape
        x = x.reshape(N, C, -1).mean(dim=-1)
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        #x = self.dropout(...)
        x = self.softmax(x)
        return x

All my inputs images are the same.
But for predicting result I use this code.

Import ...

device = torch.device("cpu")

model = Net()
model.eval()

#model.load_state_dict(torch.load("/Users/Desktop/NEW/model20.pt"))
model.load_state_dict(torch.load("/Users/Desktop/NEW/model.pth"))
# convertir les données normalisées en une torch.FloatTensor

filename = "/Users/Desktop/NEW/test/IM-0115-0001 2.jpeg"
input_image = Image.open(filename)
preprocess = transforms.Compose([
                                        transforms.RandomHorizontalFlip(),
                                        transforms.RandomRotation(20),
                                        transforms.Resize(size=(224,224)),
                                        transforms.ToTensor(),
                                        transforms.Normalize((0.5,), (0.5,)),
])

Also in others test I changed the transformer to:

preprocess = transforms.Compose([
                                        transforms.RandomHorizontalFlip(),
                                        transforms.RandomRotation(20),
                                        transforms.Resize(size=(224,224)),
                                        transforms.ToTensor(),
                                        transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
                                        ])

input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

# move the input and model to GPU for speed if available
input_batch = input_batch.to('device')
model.to('devicve')

with torch.no_grad():
    output = model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes
print(output[0])
# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
probabilities = torch.nn.functional.softmax(output[0], dim=0)
print(probabilities)

The results are the same:

RuntimeError: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]

@GUEYE, in your Net class the convolutional layer is expecting 3 channel images (please see below). Maybe it should be configured to expect only single channel images (change 3 to 1). Disclaimer: I didn’t look at the rest of your code.

self.conv1 = nn.Conv2d(3, 16, 5)

1 Like

@tiramisuNcustard Thank you for all, I have to result the problem just buy converting all my images in RGB format.