PyTorch Model Training - operands could not be broadcast together with shapes (1024,1024,5) (3,)

Hey guys, I’m facing a problem trying to train a segmentation model, as I’m new with PyTorch.

I’m trying to reproduce code from Segmentation Models library and more specificaly from this example code, with a custom dataset.

The dataset contains photos of plants taken from different perpectives different days that either have a disease on their leaves or not. If a leaf contains a disease, then its mask contains the segmentation of the whole leaf. The photographs of the dataset were taken using multispectral imaging to capture the disease spectrum response at 460, 540, 640, 640, 700, 775 and 875 nm and are 1900x3000. So I want to have input_channels=5 and the mask classes are 6.

So for example the training folder format of the dataset is:

    ├── train_images
    │   ├── plant1_day0_pov1_disease
    │       ├── image460.jpg
    │       ├── image540.jpg
    │       ├── image640.jpg
    │       ├── image775.jpg
    │       ├── image875.jpg
    │   └── plant1_day0_pov2_disease
    │       ├── image460.jpg
    │       ├── image540.jpg
    │       ├── image640.jpg
    │       ├── image775.jpg
    │       ├── image875.jpg
    │   └── etc...
    ├── train_annot
    │   ├── plant1_day0_pov1_disease.png
    │   ├── plant1_day0_pov2_disease.png
    │   └── etc...

I have made changes to the whole code in order to make it custom for this dataset (DataLoaders, augmentations, transformations into 1024x1024) and to make the model accept 5 channels as input. The problem is that when trying to do I get a ValueError: operands could not be broadcast together with shapes (1024,1024,5) (3,).

My code is available on Colab here. If you want to give you a sample of the dataset in order to reproduce it please feel free to ask me.

I would appreciate it if anyone could help me.
Thanks in advance!

Based on the stacktrace of the error message:

/usr/local/lib/python3.10/dist-packages/segmentation_models_pytorch/encoders/ in preprocess_input(x, mean, std, input_space, input_range, **kwargs)
     13     if mean is not None:
     14         mean = np.array(mean)
---> 15         x = x - mean
     17     if std is not None:

ValueError: operands could not be broadcast together with shapes (1024,1024,5) (3,) 

it seems your model normalizes the inputs with 3 values by default, so either disable it somehow or pass 5 values to mean (and most likely std).

1 Like

Thanks for the fast response!
So I tried changing the get_preprocessing into:

def get_preprocessing(preprocessing_fn, mean=None, std=None):
    transform = [
        A.Lambda(name='preprocessing_fn', image=preprocessing_fn),
        A.Lambda(name='to_tensor', image=to_tensor, mask=to_tensor),
    # If mean and std are provided, use them for normalization
    if mean is not None and std is not None:
        normalize = A.Normalize(mean=mean, std=std)
        transform.insert(1, normalize)
    return A.Compose(transform, is_check_shapes=False)

and then tried to define test mean and std values

# Define the mean and std values for your 5-channel data
mean_values = [5, 5, 5, 5, 5]
std_values = [5, 5, 5, 5, 5]

# Create preprocessing functions with the mean and std values
train_preprocessing = get_preprocessing(preprocessing_fn, mean=mean_values, std=std_values)

but nothing changes, same error mesage. Any ideas how to disable the normalization of the inputs?

If the stacktrace still points to the same line of code, add debug print statements and check the shape of all used objects to make sure the right stats are passed and used.

1 Like

The problem is here:

for source_name, source_url in sources.items():
pretrained_settings[model_name][source_name] = {
    "url": source_url,
    "input_size": [3, 224, 224],
    "input_range": [0, 1],
    "mean": [0.485, 0.456, 0.406],
    "std": [0.229, 0.224, 0.225],
    "num_classes": 1000,

Do you know how I should I change those? I mean I could change the input_size and num_classes by hand but mean/stds etc?

You could calculate the stats from your training dataset and also change it manually. The posted values are from ImageNet and since your images contain captures spectral data the stats could also differ.