HI There
I am trying to look at the model from torchvision.models.resnet50(pretrained=False)
and in this section of model, it can be seen that output of the Batch Normalization layer in (0) Bottleneck
layer contains 256 channels. However, right below it in downsample Sequential part
, number of expected input channels = 64.
ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
Specifically, I am looking at this portion:
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
When I run this model in the following manner:
import torch
import torchvision
net = torchvision.models.resnet50(pretrained=False)
y = net(torch.randn(1,3,512,512))
It runs without any problem.
But if I run it in the following manner where I am trying to feed the input layer to each subsequent layer manually I get an expected error Given groups=1, weight[64, 64, 1, 1], so expected input[1, 256, 56, 56] to have 64 channels, but got 256 channels instead
net = list(net.modules())
in_ = torch.randn(1,3,512,512)
sizes=list()
for i in range(1,len(net)):
m = net[i]
out = m(in_)
sizes.append(torch.numel(out))
in_ = out
So my question is when I run the model directly from torchvision, how is it running when the input to Conv layer expects 64 channels whereas output from previous BN layer has 256 channels