Issue with layer of convolutional blocks in GAN generator

Hi, I am trying to reproduce some Generator for a GAN based on the code from this repo (they use tensor flow) InfoGAN/model.py at master 路 lisc55/InfoGAN 路 GitHub

at some point I get the error of mismatching dimensionalities(the error itself is below). I would appreciate any help explaining what I did wrong. Thanks in advance :slight_smile:

here is my model:

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        #  opt.latent_dim = 62; opt.n_classes = 5
        input_dim = opt.latent_dim + opt.n_classes # is equal to 67
        
        self.l1 = nn.Sequential(nn.Linear(input_dim, 1024, bias=False))
        self.conv_blocks = nn.Sequential(
            nn.BatchNorm2d(1024),
            nn.ReLU(True),
            nn.Linear(1024, 7*7*128, bias=False),
            nn.BatchNorm2d(7*7*128),
            nn.ReLU(True),
            #nn.Unflatten(7*7*128, [-1, 7, 7, 128]),
            nn.ConvTranspose2d(7*7*128, 64, (4, 4), 2, 1),
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            nn.ConvTranspose2d(7*7*128, 1, (4, 4), 2, 1),
            nn.Tanh()
        )
        

    def forward(self, noise, labels):
        # batch_size is 64 
        # noise shape is (64, 62); labels shape is (64, 5)
        gen_input = torch.cat((noise, labels), -1) # shape is (64, 67)
        out = self.l1(gen_input) # shape is (64, 1024)
        out = out.view(out.shape[0], 1024, 1, 1) # shape is (64, 1024, 1, 1)
        img = self.conv_blocks(out) # <- error appears here
        return img

the error I get is: RuntimeError: mat1 and mat2 shapes cannot be multiplied (65536x1 and 1024x6272)

Original code using tf is:

def Generator(shape):
    w_init = tf.random_normal_initializer(stddev=0.02)
    gamma_init = tf.random_normal_initializer(1.0, 0.02)
    ni = tl.layers.Input(shape)
    nn = tl.layers.Dense(n_units=1024, b_init=None, W_init=w_init)(ni)
    nn = tl.layers.BatchNorm(decay=0.9, act=tf.nn.relu,
                             gamma_init=gamma_init)(nn)
    nn = tl.layers.Dense(n_units=7*7*128, b_init=None, W_init=w_init)(nn)
    nn = tl.layers.BatchNorm(decay=0.9, act=tf.nn.relu,
                             gamma_init=gamma_init)(nn)
    nn = tl.layers.Reshape([-1, 7, 7, 128])(nn)
    nn = tl.layers.DeConv2d(64, (4, 4), strides=(
        2, 2), padding="SAME", W_init=w_init)(nn)
    nn = tl.layers.BatchNorm(decay=0.9, act=tf.nn.relu,
                             gamma_init=gamma_init)(nn)
    nn = tl.layers.DeConv2d(
        1, (4, 4), strides=(2, 2), padding="SAME", act=tf.nn.tanh, W_init=w_init)(nn)
    return tl.models.Model(inputs=ni, outputs=nn)

You are trying to stack nn.BatchNorm2d layers with nn.Linear layers, which might work but it at least unusual as the default input shapes do not match. While nn.BatchNorm2d layers expect an input of [batch_size, channels, height, width], nn.Linear layers operate on inputs in the shape [batch_size, *, in_features] where * denotes additional dimensions the linear layer will be applied on (similar as if you would loop these dimensions).
Could you explain how exactly the reference model treats the dimensions?

1 Like

First of all thank You a lot for the response. Well, I am trying to implement a generator for MIST digits generation from the InfoGAN paper ([1606.03657] InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets). The proposed architecture is following

So I thought that treating every FC entry as a Linear layer is a right thing to do.

Not sure whether I answered your question鈥

Yes, treating the FC layers as nn.Linear layers is correct, but the more interesting question would be what the shape of the inputs is and how the layer is actually applied on them.
From the screenshot I guess the first FC as well as the batchnorm layer would work on a 2D input while you are then reshaping the activation to pass it to an nn.BatchNorm2d layer.
However, the screenshot misses these details and doesn鈥檛 show the activation shapes.

Well, the input for the first FC was random noise of size 62 concatenated with 1 ten-dimensional vector which is dedicated to provide the generator with some useful info.

Now I use another architecture, namely:

implementing it this way (seems to work better):

class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.input_size = 190
        
        self.l1 = nn.Sequential(nn.Linear(self.input_size, 448*2*2))

        self.bn1 = nn.BatchNorm2d(448)

        self.tconv2 = nn.ConvTranspose2d(448, 256, 4, 2, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(256)

        self.tconv3 = nn.ConvTranspose2d(256, 128, 4, 2, padding=1, bias=False)

        self.tconv4 = nn.ConvTranspose2d(128, 64, 4, 2, padding=1, bias=False)

        self.tconv5 = nn.ConvTranspose2d(64, 3, 4, 2, padding=1, bias=False)

    def forward(self, x):
        x = x.view(x.shape[0], self.input_size)
        x = self.l1(x)
        x = x.view(x.shape[0], 448, 2, 2)
        x = F.relu(self.bn1(x))
        x = F.relu(self.bn2(self.tconv2(x)))
        x = F.relu(self.tconv3(x))
        x = F.relu(self.tconv4(x))

        img = torch.tanh(self.tconv5(x))

        return img