Inferring shape via flatten operator

Is there a flatten-like operator to calculate the shape of a layer output. An example would be transitioning from a conv layer to linear layer. In all the examples I’ve seen thus far this seems to be manually calculated, ex:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = F.relu(self.fc2(x))
        return F.log_softmax(x)

What would be idiomatic torch to calculate 320?

8 Likes

You can use -1 in view so that the remaining dimension is automatically calculated, but instead of using it in the first dimension, which you know and is the batch size, you use it the the other one.
For example

bs = 5
x = torch.rand(bs, 3, 224, 224)
x = x.view(x.size(0), -1)
5 Likes

Thanks, that works for the forward method but I’m more concerned with the network definition, self.fc1 = nn.Linear(320, 50). How do I calculate the 320 there using torch?

1 Like

Hum, I’m afraid you can’t calculate that in __init__ without prior knowledge of the input shape.
You could imagine passing an input shape as argument to the __init__. In this situation, you can infer the shape by performing a forward pass over the convolutional blocks. Something like

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class Net(nn.Module):
    def __init__(self, input_shape=(1, 28, 28)):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()

        n_size = self._get_conv_output(input_shape)
        
        self.fc1 = nn.Linear(n_size, 50)
        self.fc2 = nn.Linear(50, 10)

    # generate input sample and forward to get shape
    def _get_conv_output(self, shape):
        bs = 1
        input = Variable(torch.rand(bs, *shape))
        output_feat = self._forward_features(input)
        n_size = output_feat.data.view(bs, -1).size(1)
        return n_size

    def _forward_features(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        return x

    def forward(self, x):
        x = self._forward_features(x)
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = F.relu(self.fc2(x))
        return F.log_softmax(x)
9 Likes

you can try functional api which is truly dynamic.

Hm yeah I did something similar. You definitely have to know the input size. The only other option is to write a general function which calculates the shape using conv rules without having to actually run the graph…

class Network(nn.Module):
    
    def __init__(self, input_size=(3,32,32)):
        super(Network, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3,32,3),
            nn.ReLU(),
            nn.Conv2d(32,32,3),
            nn.ReLU(),
            nn.MaxPool2d((3,3))
        )
        self.flat_fts = self.get_flat_fts(input_size, self.features)
        self.classifier = nn.Sequential(
            nn.Linear(self.flat_fts, 100),
            nn.Dropout(p=0.2),
            nn.ReLU(),
            nn.Linear(100,10),
            nn.LogSoftmax()
        )
    
    def get_flat_fts(self, in_size, fts):
        f = fts(Variable(torch.ones(1,*in_size)))
        return int(np.prod(f.size()[1:]))
    
    def forward(self, x):
        fts = self.features(x)
        flat_fts = fts.view(-1, self.flat_fts)
        out = self.classifier(flat_fts)
        return out
1 Like

@ncullen93 when you do fts(Variable(torch.ones(1,*in_size))), you are actually performing a forward pass on your network.

@fmassa right… That’s what you do in your example as well, I think? I’m saying it would be nice to not have to do that.

Yes, that’s what I do in my example.

Note that you can add a global operation (like global max/average pooling) just before your view layer, so that you know precisely the number of inputs that the linear layer will receive (as you can see in the resnet model definition, where the kernel size for the pooling can be computed on the fly using the functional interface).

We could eventually add another method to each Function that, given an input shape and a set of parameters, returns an output shape, but is it really worth it?

@apaszke what do you think?

1 Like

@fmassa we’ll need that for lazy execution, but I don’t know what the API’s going to look like yet.

Actually no, we don’t need that. I think it’s too much work and maintenance for very small gains. You probably want to do that extra forward once.

This should be a feature, seems weird to do a forward pass.

7 Likes

Keras and Lasagne can do it very easily.

1 Like

Can you please tell me what is bs here?

bs is for batch_size, which is then below refered to as x.size(0).

@apaszke If I got it correctly, the additional forward pass does not have any effect on the gradient computation in later training phase right ? And will the memory be release automatically (since we do not do backward pass, the node will not be diminished ) ?

guys any update about this feature?

I also wonder to know the answer to Xingdong_Zuo question.

Thanks!

As of 1.8, PyTorch now has LazyLinear which infers the input dimension:

A torch.nn.Linear module where in_features is inferred.

3 Likes