Inferring the size of linear layer

mechatron · January 28, 2019, 8:13pm

Hi,

I am trying to implement a GAN network using the documentation and the DCGAN tutorial provided. My discriminator looks like below

class Discriminator(torch.nn.Module):
  
  def __init__(self, ngpu):
    super().__init__()
    self.ngpu = ngpu
    self.discriminator = torch.nn.Sequential(
        torch.nn.Conv2d(args["dim_Image"][2],
                        64,
                        3,
                        stride=1,
                        padding=0,
                        bias=False),
        # channel size (64, 95, 95)
        torch.nn.Tanh(),
        torch.nn.Conv2d(64,
                        128,
                        5,
                        stride=2,
                        padding=1,
                        bias=False),
        # channel size( 128, 47, 47)
        torch.nn.Tanh(),
        torch.nn.Conv2d(128,
                        256,
                        5,
                        stride=2,
                        padding=1,
                        bias=False),
        # channel size(256, 23, 23)
        torch.nn.Tanh(),
        torch.nn.Conv2d(256,
                        512,
                        5,
                        stride=2,
                        padding=1,
                        bias=False),
        # channel size (512, 11, 11)
        torch.nn.Tanh(),
        torch.nn.Conv2d(512,
                        1024,
                        5,
                        stride=2,
                        padding=1,
                        bias=False),
        # channel size (1024, 5, 5)
        torch.nn.Tanh(),
        torch.nn.Conv2d(1024,
                        1024,
                        5,
                        stride=1,
                        padding=0,
                        bias=False),
        # channel size(1024, 1, 1)
        torch.nn.Tanh(),
        torch.nn.Linear(args["batch_size"]*1024,
                       args["batch_size"],
                       bias=False),
        torch.nn.Sigmoid())
    
  def forward(self, input):
    return self.discriminator(input)

This code is throwing an error and rightly so. However when I change the torch.nn.Linear() function to

torch.nn.Linear(1024, 1, bias=False),

it still throws error. I am not sure if I understand correctly the workings of the sequential module. In my opinion the above piece of code shouldn’t throw error for a single image. Shouldn’t it also be the case for the batch ?

If someone could please explain how the batches are processed and how to rectify my code, it would be hugely helpful.

Thanking you and warm regards,
Nirmal

vmirly1 · January 28, 2019, 8:45pm

There are a couple of things here.

First of all, the batch-size should not be given as input size to the Linear layer. The input size is independent of batch-size.
the linear cannot be placed in the same Sequential function right after the conv layers. The reason is the input to linear needs to be reshaped. So you have to create two sections, one Sequntial for all the conv layers (and their pooling, and activations, …), and then the output of this is reshaped to a flat tensor of shape batch-size x ?, and then this is passed to the linear layers.

    self.conv_layers = nn.Sequential(
        ## put the conv layers here
    )

   self.fc_layers = nn.Sequential(
       ## put the FC layers here
   )

   def forward(self, x):
      x = self.conv_layers(x)
      x = x.view(-1, ??) ## we need to know the value of ?? for the input size to the first linear layer
      return self.fc_layers(x)

mechatron · January 28, 2019, 10:20pm

@vmirly1 Thank you for the detailed response. This makes perfect sense. In fact this is what I was thinking as well. But I was not sure.

Now if there are some developers around, would it be too much of a complexity to abstract away the conversion ? So that one could use both the convolutional and linear layers in a same Sequential module.

sparseinference · January 29, 2019, 12:58am

Here’s an idea…
To get more flexibility in your architecture you could use nn.ModuleList and construct the pipeline yourself.
Put all the layers in a list, and the shapes of the layers that follow in another list (or the shapes required as input by each layer):

  self.layers = nn.ModuleList(modules=[nn.Linear(,),  nn.Conv2d(,,,), ..])
  self.shapes = [(,,), (,,,),  ..]

then thread all the parts together:

def forward(self, x):
  for layer,shape in zip(self.layers,self.shapes):
    x = layer(x).reshape(shape) # reshape for the next layer if needed
  return x

I hope that helps.

mechatron · January 29, 2019, 5:14pm

Thank you @sparseinference. I will look into it. In future, this might be certainly of help. For now I am using the method suggested by Vahid.

Sijie_Chen · October 20, 2021, 4:28pm

you have a new choice now. you can try the LazyLinear layer