What is -1 in output shape of a model in torch.summary?

Sanjayvarma11 · January 27, 2020, 3:02pm

I am printing summary of a model using summary module from torch.summary.But in the result of column ‘output shape’ of MNIST dataset i am getting output as

    Layer (type)               Output Shape         Param #

================================================================
Conv2d-1 [-1, 32, 28, 28] 320
Conv2d-2 [-1, 64, 28, 28] 18,496
MaxPool2d-3 [-1, 64, 14, 14] 0
Conv2d-4 [-1, 128, 14, 14] 73,856
Conv2d-5 [-1, 256, 14, 14] 295,168
MaxPool2d-6 [-1, 256, 7, 7] 0
Conv2d-7 [-1, 512, 5, 5] 1,180,160
Conv2d-8 [-1, 1024, 3, 3] 4,719,616
Conv2d-9 [-1, 2048, 1, 1] 18,876,416

can anyone tell me what is -1 in output shape ??Thanks in advance

ptrblck · January 27, 2020, 3:44pm

The -1 in your example would represent an arbitrary number for the batch dimension.
While the other shapes are fixed to the shown values, you can use any batch size for this model.

Sanjayvarma11 · January 27, 2020, 3:59pm

Sir but I already set my batch size to 128 them why i am getting -1

Geeks_Sid · January 27, 2020, 5:19pm

you’re printing using torchsummary if i am not wrong. Anyway, it’s just for example. it could also be ? where ? is your batchsize.

Sanjayvarma11 · January 27, 2020, 11:55pm

torch.manual_seed(1)
batch_size = 128

kwargs = {‘num_workers’: 1, ‘pin_memory’: True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST(’…/data’, train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(’…/data’, train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=batch_size, shuffle=True, **kwargs)
It is a code i wrote for mnist dataset

erm · April 22, 2020, 8:23am

I am not sure that is the case. When using the torch.summary, the batch size is shown by the second position, right? In this example, the second column of the tensor appears to be digits power of two.and the other two correspond to H and W of the images in MNIST.
I do not have the answer, I actually have the same question. Can it be related with some flattening operation instead?
I tested my own architecture with several torch vision datasets and for all of them I obtain this -1 on the first column no matter the bs I choose.
For a new dataset that I am interested on, and for which I needed to write a custom dataset class, I get weird results, and in addition, torch.summary shows that my layers lack that initial -1, and I am trying to get it, reason why I need to understand where does it come from.

ptrblck · April 22, 2020, 8:27am

I’m assuming that summary() outputs the tensor shapes in the default format.
For 2-dimensional layers, such as nn.Conv2d and nn.MaxPool2d, the expected shape is given as [batch_size, channels, height, width].
dim1 would therefore correspond to the channels, which are often chosen to be powers of 2 for performance reasons (“good” indexing is easier for powers of 2).

The -1 would therefore be the batch dimension, which is flexible in PyTorch. I.e. you don’t have to specify the batch size for your model and it will take variable batches as long as you have enough memory etc.

Could you post your result and (if possible) your model definition, so that we could have a look?

erm · April 22, 2020, 8:44am

#Def the Network and forward prop

dim = 12288 #64*64*3channels
n_h = 7
n_classes = 2 # 2 classes
n_y = 5

class Net(nn.Module): 
    def __init__(self):
        
        super(Net,self).__init__()   
        self.input_layer = nn.Linear(dim,n_h)
        self.hidden_layer = nn.Linear(n_h,n_y) 
        self.output_layer = nn.Linear(n_y,n_classes) 
           
    def forward(self, x):
        x = x.view(-1, dim) #flatten image input
        x = F.relu(self.input_layer(x)) 
        x = F.relu(self.hidden_layer(x))
        x = self.output_layer(x)#no ReLU to the last layer.
        return F.log_softmax(x, dim=1) #convert logits to probabilities that sum up to 1. 

net= Net()
print(net)
summary(Net(), input_size=(1,dim), batch_size=bs, device="cpu") 
net.state_dict()

type or paste code here

Net(
(input_layer): Linear(in_features=12288, out_features=7, bias=True)
(hidden_layer): Linear(in_features=7, out_features=5, bias=True)
(output_layer): Linear(in_features=5, out_features=2, bias=True)
)

    Layer (type)               Output Shape         Param #

================================================================
Linear-1 [32, 7] 86,023
Linear-2 [32, 5] 40
Linear-3 [32, 2] 12

Total params: 86,075
Trainable params: 86,075
Non-trainable params: 0

Input size (MB): 1.50
Forward/backward pass size (MB): 0.00
Params size (MB): 0.33
Estimated Total Size (MB): 1.83

This is the output when feeding hdf5 dataset with custom dataset class.

Instead, for MNIST:
Net(
(input_layer): Linear(in_features=784, out_features=7, bias=True)
(hidden_layer): Linear(in_features=7, out_features=5, bias=True)
(output_layer): Linear(in_features=5, out_features=10, bias=True)
)

    Layer (type)               Output Shape         Param #

================================================================
Linear-1 [-1, 32, 7] 5,495
Linear-2 [-1, 32, 5] 40
Linear-3 [-1, 32, 10] 60

Total params: 5,595
Trainable params: 5,595
Non-trainable params: 0

Input size (MB): 0.10
Forward/backward pass size (MB): 0.01
Params size (MB): 0.02
Estimated Total Size (MB): 0.12

ptrblck · April 22, 2020, 8:58am

Thanks for the code!
In your first example you are passing the batch_size=bs directly to sumamry(), which will then print it out.
If you skip it and leave it as -1 (default), you’ll get:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Linear-1                    [-1, 7]          86,023
            Linear-2                    [-1, 5]              40
            Linear-3                    [-1, 2]              12
================================================================
Total params: 86,075
Trainable params: 86,075
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 0.00
Params size (MB): 0.33
Estimated Total Size (MB): 0.38
----------------------------------------------------------------

erm · April 22, 2020, 9:08am

Oh! thanks a lot, your fast answer saved me a lot of time.

What is -1 in output shape of a model in torch.summary?

I am printing summary of a model using summary module from torch.summary.But in the result of column ‘output shape’ of MNIST dataset i am getting output as

Net( (input_layer): Linear(in_features=12288, out_features=7, bias=True) (hidden_layer): Linear(in_features=7, out_features=5, bias=True) (output_layer): Linear(in_features=5, out_features=2, bias=True) )

================================================================ Linear-1 [32, 7] 86,023 Linear-2 [32, 5] 40 Linear-3 [32, 2] 12

Total params: 86,075 Trainable params: 86,075 Non-trainable params: 0

Input size (MB): 1.50 Forward/backward pass size (MB): 0.00 Params size (MB): 0.33 Estimated Total Size (MB): 1.83

Instead, for MNIST: Net( (input_layer): Linear(in_features=784, out_features=7, bias=True) (hidden_layer): Linear(in_features=7, out_features=5, bias=True) (output_layer): Linear(in_features=5, out_features=10, bias=True) )

================================================================ Linear-1 [-1, 32, 7] 5,495 Linear-2 [-1, 32, 5] 40 Linear-3 [-1, 32, 10] 60

Total params: 5,595 Trainable params: 5,595 Non-trainable params: 0

Input size (MB): 0.10 Forward/backward pass size (MB): 0.01 Params size (MB): 0.02 Estimated Total Size (MB): 0.12