How to automatically get in-features from nn.conv2d to nn.linear

Hi,

Is there any way that we don’t have to declare in-features of an nn.linear module explicitly? I am using Conv2d modules and I know that we need to use .view to flatten the values but can it be automatically done by extracting the incoming tensor shapes?

The in_features depend on the shape of your input, so what could be done is to set the input shape as an argument, pass a random tensor through the conv layers, get the shape and initialize the linear layers using this shape.
That’s basically the automatic way of passing an input through your model, print the shape right before the linear layer and set the right in_features afterwards. At least that’s my workflow, if I’m too lazy to calculate the shape. :wink:

3 Likes

I am currently dealing with this. Do you have an example of how your workflow is of what you are suggesting (“At least that’s my workflow, if im too lazy to calculate the shape”)?

Here is a simple example of this workflow:

# initial setup
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 6, 3, 1, 1)
        self.linear = nn.Linear(1, 10) # initialize with any value
        
    def forward(self, x):
        x = self.conv(x)
        x = x.view(x.size(0), -1)
        # unknown shape here as I don't want to calculate it manually
        print(x.shape)
        x = self.linear(x) # will raise a shape mismatch, but that's OK
        return x

model = MyModel()
x = torch.randn(1, 3, 224, 224)
out = model(x)
# > RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x301056 and 1x10)

# Now change the in_features to 301056 
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 6, 3, 1, 1)
        self.linear = nn.Linear(301056, 10)
        
    def forward(self, x):
        x = self.conv(x)
        x = x.view(x.size(0), -1)
        x = self.linear(x)
        return x

model = MyModel()
out = model(x) # works

In the first iteration I’m expecting a shape mismatch if the shape calculation and will refine the in_features later.
Of course one could calculate the output shape of the single conv layer manually, but this is just used as an example in case the previous layers are more complicated.
In the latest PyTorch releases you could also use the “lazy” layers e.g. nn.LazyLinear which don’t expect the in_features anymore.

Hi @ptrblck
Does lazy definition work?

class Model(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.conv = nn.Conv2d(3, 6, 3, 1, 1)
        self.linear = None 
        self.num_classes = num_classes

    def forward(self, x):
        x = self.conv(x)
        x = x.view(x.size(0), -1)
        if self.linear is None:
            self.linear = nn.Linear(x.size(1), self.num_classes)
        x = self.linear(x)
        return x 

Your approach would generally work, but you would have to be careful about creating the optimizer as the parameters are not fully initialized during the instantiation of the model.
E.g. this standard approach would be wrong:

model = Model()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

since model.parameters() does not yet contain the parameters of self.linear.
You would thus have to use:

model = Model()
out = model(dummy_input)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

The same limitation applies to the usage of Lazy* modules as described here.

For your manual approach you would also have to take care of pushing the module to the device after the first forward pass, which should be done automatically using the Lazy* modules.

Thanks for your reply @ptrblck.

However, I am still wondering how this is automatically?

x = torch.randn(1, 3, 224, 224)

# Now change the in_features to 301056 
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 6, 3, 1, 1)
        self.linear = nn.Linear(301056, 10)
        
    def forward(self, x):
        x = self.conv(x)
        x = x.view(x.size(0), -1)
        x = self.linear(x)
        return x

model = MyModel()
out = model(x) # works

Then if I change the input it fails:

x = torch.randn(1, 3, 128, 128)

And to make it work I have to manually change self.linear to:

self.linear = nn.Linear(98304, 10)

Is this the expected behaviour?

Is the following the right approach if I want to make it automatically? Or is there a better approach?

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(3, 6, 3, 1, 1)
        
    def forward(self, x):
        x = self.conv(x)
        linear1 = nn.Linear(x.size(1), 10)
        x = linear1(x)
        return x

It’s not an automatic approach and I’m not sure if auto-correct changed the phrasing back in 2018 or what I meant back then.
In any case, you would have these approaches:

  • calculate the output activation shapes and set the right values
  • run a dummy forward pass, print the activation shapes, and set the expected values
  • use the Lazy* layers to avoid setting the in_features manually

Your approach would have the same shortcomings described before. I.e. be careful when to pass the parameters to the optimizer and make sure the parameters are moved to the right device.

2 Likes

Thanks for your answer @ptrblck! I think I found a solution that can work from here:

def _infer_flat_size(self, model):
    model = nn.Sequential(*model)
    model_output = model(torch.ones(1, *self.input_size))
    return int(np.prod(model_output.size()[1:]))  

So using this function to calculate the size and use it to set the in_features.

Hi again @ptrblck.

So I went with the option: run a dummy forward pass, print the activation shapes, and set the expected values.

def _infer_flat_size(self, model):
    model = nn.Sequential(*model)
    model_output = model(torch.ones(1, *self.input_size))
    return int(np.prod(model_output.size()[1:])) 

However, then I tried an experiment with a setup (fixed seeds) where I know I can reproduce the exact same result. Here I did two different experiments:

  1. One where I initialized the in_features with a specific value (e.g. 301056).
  2. One where I used the _infer_flat_size function() to calculate the value to automatically set it.

But then I get two different training_loss, training_accuracy etc. How come? Is it because of the following line?

    model_output = model(torch.ones(1, *self.input_size))

In case you are using e.g. batchnorm layers, the dummy forward pass should be performed in model.eval() mode. Otherwise the fake inputs would be used to update the running stats of all batchnorm layers.
Could this be the case?

I made a simplified version of my code below. Here I added the model.eval() part as you suggested @ptrblck (or atleast I think this is how you suggested). However, I still don’t see the models being reproducible and I am wondering why that is? Is there something different I should add?

class Network(nn.Module):
    def __init__(self, input_size):
        # calling constructor of parent class
        super(Network, self).__init__()
        self.input_size = input_size
        model = []
        model += [
            nn.BatchNorm2d(1),
            nn.Conv2d(in_channels=1, out_channels=50, kernel_size=(5, 5), stride=1, padding=0),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(num_features=50),
            nn.Flatten()
        ]
        # Use this method or use lazyLinear (newer pytorch) that dont need in_features as input
        self.flat_size = self._infer_flat_size(model)
        model += [
            nn.Linear(in_features=self.flat_size, out_features=450),
            nn.ReLU(inplace=True),
            nn.Linear(in_features=450, out_features=10)
        ]
        self.model = nn.Sequential(*model)

    def _infer_flat_size(self, model):
        model = nn.Sequential(*model)
        model.eval()
        model_output = model(torch.ones(1, *self.input_size))
        return int(np.prod(model_output.size()[1:]))

    def forward(self, x):
        return self.model(x)
  1. kernel 5 with padding 0 will eat two rows and two columns from each side away in convolution, so after first conv2d your image dimension will be (H-4, W-4)
  2. out_channels = 50
  3. thus your nn.Linear input_features after flatten will be (H-4) * (W-4) * out_channels or (input_size.shape[2] - 4) * (input_size.shape[3] - 4) * out_channels (==50)

also what @ptrblck is proposing is much easier. Fill input_channels with some arbitrary value, run dummy pass (I prefer torchinfo.summary in this case) and let torch to calculate these values for you in form of error)

The network is not meant to work @my3bikaht, but just to show how I use the model.eval() and infer_flat_size function. So I was not expecting the model to actually work. I just randomly inserted some numbers.

What you are suggesting, can you give an example of this?

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Conv2d(1, 50, kernel_size=5, stride=1, padding=0),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(50),
            nn.Flatten(),
            nn.Linear(9999, 450),
            nn.ReLU(inplace=True),
            nn.Linear(450, 10)
        ])
def forward(self, x):
        return self.model(x)
import torchinfo
model = Network()
torchinfo.summary(model, (32,1,224,224))

just run it like this once and torch will give you error what shape it expects for the nn.Linear layer

or, knowing input shape, you can calculate output shape (I prefer padding=‘same’ for these situations)

class Network(nn.Module):
    def __init__(self, input_shape=(224,224), conv_out_channels=50):
        super().__init__()
        shape_after_conv = input_shape[0]*input_shape[1]*conv_out_channels
        self.model = nn.Sequential(
            nn.Conv2d(1, conv_out_channels, kernel_size=5, stride=1, padding='same'),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(50),
            nn.Flatten(),
            nn.Linear(shape_after_conv, 450),
            nn.ReLU(inplace=True),
            nn.Linear(450, 10)
        ])
def forward(self, x):
        return self.model(x)

it’s not like your data will have diferent shapes every day. Even traditional ‘default’ models like resnets have predefined input shape like 224x224 and tinkering with it is possible but there’s very little reason since it is pretrained for this exact shape