Importing and concatenating images along given dimension

UPDATED: Clarifying the question and providing my code.

I have a dataset of RGB images as well as the corresponding Alpha map images (image mode=P). I would like to import and concatenate the alpha maps with the RGB images to create a 4-channel RGBA input for the network. Currently, my code is creating a 6-channel input. trimap_alpha has 3 channels instead of the intended 1. How can I do this properly?

I am creating and loading datasets:

data_dir = 'data’
image_datasets = {}
for label in [‘input_training_lowres’, ‘gt_training_lowres’, ‘trimap_training_lowres’]:
image_datasets[label] = datasets.ImageFolder(
os.path.join(data_dir, label),
data_transforms[label]
)

dataloaders = {}
for label in [‘input_training_lowres’, ‘gt_training_lowres’, ‘trimap_training_lowres’]:
dataloaders[label] = torch.utils.data.DataLoader(
image_datasets[label], batch_size=4, shuffle=True, num_workers=4
)

dataset_sizes = {}
for label in [‘input_training_lowres’, ‘gt_training_lowres’, ‘trimap_training_lowres’]:
dataset_sizes[label] = len(image_datasets[label])

and iterating over the data:

        for input_training, truth_training, trimap_training in zip(dataloaders['input_training_lowres'], dataloaders['gt_training_lowres'], dataloaders['trimap_training_lowres']):

            rgb_data, target = input_training
            gt_data, gt_target = truth_training
            trimap_data, trimap_target = trimap_training
            trimap_alpha = trimap_data
            ground_truth = gt_data
            inputs = torch.cat((rgb_data, trimap_alpha), 1)

Traceback:

Traceback (most recent call last):
  File "D:/MachineLearning/Automatter/Automatter/smallEDNetwork6_val.py", line 258, in <module>
    model_ft = train_model(model_ft, optimizer_ft, 1)
  File "D:/MachineLearning/Automatter/Automatter/smallEDNetwork6_val.py", line 175, in train_model
    predicted_truth = model(inputs)
  File "C:\Users\Nic\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "D:/MachineLearning/Automatter/Automatter/smallEDNetwork6_val.py", line 46, in forward
    out1 = F.relu(self.conv1(x))
  File "C:\Users\Nic\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Nic\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\conv.py", line 277, in forward
    self.padding, self.dilation, self.groups)
  File "C:\Users\Nic\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 90, in conv2d
    return f(input, weight, bias)
RuntimeError: Given groups=1, weight[64, 4, 11, 11], so expected input[4, 6, 224, 224] to have 4 channels, but got 6 channels instead

Process finished with exit code 1

You would most likely need a custom Dataset http://pytorch.org/docs/master/data.html#torch.utils.data.Dataset. With that, you can easily feed it into a Dataloader and train from there.

Updated the question and provided a specific case. I have a dataset of RGB images as well as the corresponding Alpha map images (which contain only black, 0.5, and white). I would like to import and concatenate the alpha maps with the RGB images to create a 4-channel RGBA input for the network. Currently, my code is creating a 6-channel input, presumably because it is concatenating RGB with RGB instead of RGB with A. How can I do this properly?

Just a stab in the dark… are you sure trimap_alpha has only one channel?

No, it seems that trimap_alpha has 3 channels. I’ve double-checked now, the alpha maps are P images and the color images are RGB.

<PIL.PngImagePlugin.PngImageFile image mode=P size=800x497 at 0x22937E158D0>
<PIL.PngImagePlugin.PngImageFile image mode=RGB size=800x497 at 0x22937EBD978>

How should I process the alpha maps?

What format are the trimap images stored in and how does your script load them?

They are all png images and they’re being loaded at the same time as the RGB images.

data_dir = 'data’
image_datasets = {}
for label in [‘input_training_lowres’, ‘gt_training_lowres’, ‘trimap_training_lowres’]:
image_datasets[label] = datasets.ImageFolder(
os.path.join(data_dir, label),
data_transforms[label]
)

dataloaders = {}
for label in [‘input_training_lowres’, ‘gt_training_lowres’, ‘trimap_training_lowres’]:
dataloaders[label] = torch.utils.data.DataLoader(
image_datasets[label], batch_size=4, shuffle=True, num_workers=4
)

dataset_sizes = {}
for label in [‘input_training_lowres’, ‘gt_training_lowres’, ‘trimap_training_lowres’]:
dataset_sizes[label] = len(image_datasets[label])

        for input_training, truth_training, trimap_training in zip(dataloaders['input_training_lowres'], dataloaders['gt_training_lowres'], dataloaders['trimap_training_lowres']):

            rgb_data, target = input_training
            gt_data, gt_target = truth_training
            trimap_data, trimap_target = trimap_training
            trimap_alpha = trimap_data
            ground_truth = gt_data
            inputs = torch.cat((rgb_data, trimap_alpha), 1)

At a guess they are stored in a palette-based png format, and not in simple 1-channel greyscale.
Maybe trimap_alpha.convert("L") might do the trick. It might be wise to save one converted image in order to visually check that result is correct.

1 Like

I can’t see what is wrong here. According to the source the images are loaded using either accimage or PIL. PIL converts images to RGB when loading them. I’m not sure what accimage would do, but as far as I can tell PIL is the default.

Maybe one of the transforms is causing trouble.

It is also worth mentioning that torchvision provides a grayscale transform that might be useful. http://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.Grayscale

1 Like

I think something like that will work. However, trimap_alpha is a tensor. I need to convert it to L before I transform it to a tensor. How can I do that?

I forgot to include my data transforms. sorry about that.

data_transforms = {
    'input_training_lowres': transforms.Compose([
        transforms.CenterCrop(224),
        # transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        # transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'gt_training_lowres': transforms.Compose([
        # transforms.Scale(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        # transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'trimap_training_lowres': transforms.Compose([
        # transforms.Scale(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        # transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

I assume you could simply add a grayscale transform to the list of transforms.

1 Like

Works perfectly. Thank you very much for your help.

Hi,
what if i have a set of images that are of RGBA format, i mean not seperate images for channels. How can i modify this code for that?