Resnet - mismatch of tensor shape

HI There

I am trying to look at the model from torchvision.models.resnet50(pretrained=False) and in this section of model, it can be seen that output of the Batch Normalization layer in (0) Bottleneck layer contains 256 channels. However, right below it in downsample Sequential part, number of expected input channels = 64.

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )

Specifically, I am looking at this portion:

(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)

When I run this model in the following manner:

import torch 
import torchvision
net = torchvision.models.resnet50(pretrained=False)
y = net(torch.randn(1,3,512,512))

It runs without any problem.

But if I run it in the following manner where I am trying to feed the input layer to each subsequent layer manually I get an expected error Given groups=1, weight[64, 64, 1, 1], so expected input[1, 256, 56, 56] to have 64 channels, but got 256 channels instead

net = list(net.modules())
in_ = torch.randn(1,3,512,512)
sizes=list()
for i in range(1,len(net)):
    m = net[i]
    out = m(in_)
    sizes.append(torch.numel(out))
    in_ = out

So my question is when I run the model directly from torchvision, how is it running when the input to Conv layer expects 64 channels whereas output from previous BN layer has 256 channels

The model is not necessarily run in sequential order as printed.

@SimonW is there a way to get the correct order somehow ?

The execution order is defined by user code in forward. So unfortunately no general way.