DC GAN Modification — RGB to Grayscale

web_tracer · July 29, 2021, 2:10pm

The full code is available here — DCGAN Tutorial — PyTorch Tutorials 1.9.0+cu102 documentation

What is a proper way to modify the code above to adjust the model for a dataset that consists of black and white/grayscale images, not RGB?

This model is designed for processing 3-channel images (RGB) while I need to handle some black and white image data, so I’d like to change the “ch” parameter to “1” instead of “3.”

But if we just change this parameter — “nc = 3” → “nc = 1” — without adjusting generator’s and discriminator’s code blocks, executing just gives an error message:

RuntimeError: Given groups=1, weight of size [64, 1, 4, 4], expected input[128, 3, 64, 64] to have 1 channels, but got 3 channels instead

Please advise.

ayalaa2 · July 29, 2021, 7:27pm

Changing nc-->1 will result in a generator/discriminator capable of handling one channel (grayscale). However, you need to also adjust the dataloading/transforming step to convert the images into grayscale. That error suggests that this is not happening yet.

You can add a transform like this: torchvision.transforms — Torchvision 0.10.0 documentation

web_tracer · July 30, 2021, 6:09am

Hi Alex,

Thanks for stepping in! Can you please shed some lite on what is the proper way of adjusting the dataloading/transforming phase step-by-step?

I’m afraid I’m a bit stuck here.

Thanks in advance.

ayalaa2 · July 30, 2021, 8:47pm

Sure, so the dataset object creation is the issue here:

dataset = dset.ImageFolder(root=dataroot,
                           transform=transforms.Compose([
                               transforms.Resize(image_size),
                               transforms.CenterCrop(image_size),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                           ]))

Specifically, we’re interested in those transforms. Each datapoint starts off as a PIL image. The transforms suggest that each datapoint will:

Get resized
Cropped
Turned into a tensor
Normalized. Here the first tuple denotes the mean for each channel and the second tuple denotes the std for each channel.

We’ll need to do 2 things here. We need to add a transformation for grayscale and because we’ll only have one channel, we’ll go ahead and adjust that normalization step as well.

We’ll end up with something like this:

dataset = dset.ImageFolder(root=dataroot,
                           transform=transforms.Compose([
                               transforms.Resize(image_size),
                               transforms.CenterCrop(image_size),
                               transforms.Grayscale(num_output_channels=1),
                               transforms.ToTensor(),
                               transforms.Normalize(0.5, 0.5),
                           ]))

This dataset will now return tensors of channel size 1.

web_tracer · August 1, 2021, 8:47am

Thank you very much! It worked.