RuntimeError: Given groups=1, weight of size [128, 3, 3, 3], expected input[1, 1, 512, 512] to have 3 channels, but got 1 channels instead

I have problem with AI models.
My code:

import torch
import numpy as np
from PIL import Image
from diffusers import StableDiffusionInpaintPipeline
import cv2
import os
from dotenv import load_dotenv
from segment_anything import sam_model_registry, SamPredictor

load_dotenv()
REV_ANIMATED_MODEL_PATH = os.getenv(‘REV_ANIMATED_MODEL_PATH’)
KANDINSKY_MODEL_PATH = os.getenv(‘KANDINSKY_MODEL_PATH’)
VAE_MODEL_PATH = os.getenv(‘VAE_MODEL_PATH’)
SAM_MODEL_PATH = os.getenv(“SAM_MODEL_PATH”)
MODEL_TYPE = “vit_b”
CHECKPOINT_PATH = ‘sam_vit_b_01ec64.pth’
SDV5_MODEL_PATH = os.getenv(‘SDV5_MODEL_PATH’)
img = ‘inpaint-example.png’

#mask generation function
def mask_generator(img):
image = cv2.imread(img)
image_rgb = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

sam = sam_model_registry[MODEL_TYPE ](checkpoint=CHECKPOINT_PATH)
sam.to(device='cuda')
mask_predictor = SamPredictor(sam)
mask_predictor.set_image(image_rgb)
input_point = np.array([[250, 250]])
input_label = np.array([1])
masks, scores, logits = mask_predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=False,
)
mask = masks.astype(float) * 255
mask = np.transpose(mask, (1, 2, 0))
_, bw_image = cv2.threshold(mask, 100, 255, cv2.THRESH_BINARY)
cv2.imwrite('mask.png', bw_image)

#inpainting function
def inpaint(init_img, mask):
init_image = Image.open(init_img)
mask_image = Image.open(mask)
pipe = StableDiffusionInpaintPipeline.from_pretrained(
SDV5_MODEL_PATH,
use_safetensors=True,
torch_dtype=torch.float32
).to(‘cpu’)
negative_prompt = ‘ugly’
prompt = “a grey cat sitting on a bench, high resolution”
image = pipe(prompt=prompt,
negative_prompt=negative_prompt,
image=init_image,
mask_image=mask_image
).images[0]
image.save(‘output.png’)

mask_generator(img)
inpaint(img, ‘mask.png’)

And i can see this problem in terminal: RuntimeError: Given groups=1, weight of size [128, 3, 3, 3], expected input[1, 1, 512, 512] to have 3 channels, but got 1 channels instead

Can anyone help?

Hi Dmitr!

I haven’t really looked at the code you posted, but I believe the following is happening:

Likely, the first layer of your model is a convolution layer that expects three channels
(e.g., rgb color), while you are inputting a one-channel image (e.g., grayscale).

Consider:

>>> import torch
>>> torch.__version__
'2.6.0+cu126'
>>> conv_1 = torch.nn.Conv2d (1, 128, 3)
>>> conv_3 = torch.nn.Conv2d (3, 128, 3)
>>> t = torch.randn (1, 1, 512, 512)
>>> conv_1 (t).shape
torch.Size([1, 128, 510, 510])
>>> conv_3 (t).shape
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<path_to_pytorch_install>/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path_to_pytorch_install>/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path_to_pytorch_install>/torch/nn/modules/conv.py", line 554, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path_to_pytorch_install>/torch/nn/modules/conv.py", line 549, in _conv_forward
    return F.conv2d(
           ^^^^^^^^^
RuntimeError: Given groups=1, weight of size [128, 3, 3, 3], expected input[1, 1, 512, 512] to have 3 channels, but got 1 channels instead

Best.

K. Frank

Thanks for the answer.

Did I understand correctly that I need to convert 1 channel img to 3 channel img?

If black and white is all you have, you could just repeat it on the channels dim 3 times.

if image.size()[1]==1: #check if it's single channel
    image = image.expand(-1, 3, -1, -1) #expand to 3 channels

Hi. Is image OpenCV object?

I’m providing you with an example. You’ll have to figure where in your code the image is first entering the model. And then you can apply that code to it.

By the way, your current code display is very difficult to read. You need to wrap all code with three backticks before and after. (Same key as the ~ is located.)

Hello!
Here is my main inpainting function

def inpaint(init_img, mask):
    init_image = Image.open(init_img)
    mask_image = Image.open(mask)

#Your code
    if mask_image.size()[1] == 1:  # check if it's single channel
        mask_image = mask_image.expand(-1, 3, -1, -1)
#

    pipe = StableDiffusionInpaintPipeline.from_pretrained(
        SDV5_MODEL_PATH,
        use_safetensors=True,
        torch_dtype=torch.float32
    ).to('cpu')

    negative_prompt = 'ugly'
    prompt = "a grey cat sitting on a bench, high resolution"
    image = pipe(prompt=prompt,
                 negative_prompt=negative_prompt,
                 image=init_image,
                 mask_image=mask_image
                 ).images[0]
    image.save('output.png')

When i inserted your code, pycharm started show an errors.


Ah. Much better. :slightly_smiling_face: Try this:


def inpaint(init_img, mask):
    init_image = Image.open(init_img).convert('RGB') #<<<
    mask_image = Image.open(mask).convert('RGB') #<<<

    pipe = StableDiffusionInpaintPipeline.from_pretrained(
        SDV5_MODEL_PATH,
        use_safetensors=True,
        torch_dtype=torch.float32
    ).to('cpu')

    negative_prompt = 'ugly'
    prompt = "a grey cat sitting on a bench, high resolution"
    image = pipe(prompt=prompt,
                 negative_prompt=negative_prompt,
                 image=init_image,
                 mask_image=mask_image
                 ).images[0]
    image.save('output.png')

That will do the same thing, except for a PIL image object.

Thank you very much!
You solved my problem.

1 Like