Conditional image GAN is diverging

Badr_Belhiti · April 7, 2021, 6:09am

I’m trying to implement a conditional GAN that takes in 128x128x1 images of the edges of facial images and produces the corresponding 128x128x1 “shaded in” image. For the discriminator, I’m concatenating the conditional input image and generated image on the channel dimension. The problem is that the discriminator and generator immediately diverge (discriminator loss goes to 100 and generator loss goes to 0). Here’s my discriminator architecture:

# nc = 1
# ndf = 32
class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu

        self.main = nn.Sequential(
            nn.Conv2d(nc * 2, ndf * 2, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
             
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            
            nn.Conv2d(ndf * 8, ndf * 16, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 16),
            nn.LeakyReLU(0.2, inplace=True),
            
            nn.Conv2d(ndf * 16, ndf * 32, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 32),
            nn.LeakyReLU(0.2, inplace=True),
            
            nn.Conv2d(ndf * 32, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        edges, grayscale = input
        # Concatenate images on channel dimension
        new_input = torch.cat((edges, grayscale), dim=1)
        return self.main(new_input)

I was thinking this architecture is fine, because the discriminator should be easily able to learn whether the generated image correlates with the conditional input by just using some edge detection filters and differences. It’s also worth noting that this architecture works fine with a traditional GAN with half the feature maps. Any help is appreciated, thank you!