Is there a flatten-like operator to calculate the shape of a layer output. An example would be transitioning from a conv layer to linear layer. In all the examples I’ve seen thus far this seems to be manually calculated, ex:

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = F.relu(self.fc2(x))
return F.log_softmax(x)

You can use -1 in view so that the remaining dimension is automatically calculated, but instead of using it in the first dimension, which you know and is the batch size, you use it the the other one.
For example

bs = 5
x = torch.rand(bs, 3, 224, 224)
x = x.view(x.size(0), -1)

Thanks, that works for the forward method but I’m more concerned with the network definition, self.fc1 = nn.Linear(320, 50). How do I calculate the 320 there using torch?

Hum, I’m afraid you can’t calculate that in __init__ without prior knowledge of the input shape.
You could imagine passing an input shape as argument to the __init__. In this situation, you can infer the shape by performing a forward pass over the convolutional blocks. Something like

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
class Net(nn.Module):
def __init__(self, input_shape=(1, 28, 28)):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
n_size = self._get_conv_output(input_shape)
self.fc1 = nn.Linear(n_size, 50)
self.fc2 = nn.Linear(50, 10)
# generate input sample and forward to get shape
def _get_conv_output(self, shape):
bs = 1
input = Variable(torch.rand(bs, *shape))
output_feat = self._forward_features(input)
n_size = output_feat.data.view(bs, -1).size(1)
return n_size
def _forward_features(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
return x
def forward(self, x):
x = self._forward_features(x)
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = F.relu(self.fc2(x))
return F.log_softmax(x)

Hm yeah I did something similar. You definitely have to know the input size. The only other option is to write a general function which calculates the shape using conv rules without having to actually run the graph…

Note that you can add a global operation (like global max/average pooling) just before your view layer, so that you know precisely the number of inputs that the linear layer will receive (as you can see in the resnet model definition, where the kernel size for the pooling can be computed on the fly using the functional interface).

We could eventually add another method to each Function that, given an input shape and a set of parameters, returns an output shape, but is it really worth it?

@apaszke If I got it correctly, the additional forward pass does not have any effect on the gradient computation in later training phase right ? And will the memory be release automatically (since we do not do backward pass, the node will not be diminished ) ?