I wanted to change the parameter in a Maxpool2d layer of a VGG16 network from ceil_mode=False to ceil_mode=True to achieve a certain strict output shape requirement to reproduce a piece of code. Can this be done if I am using a pretrained vgg16 model from torchvision.models?
Hi Em!
Yes, this should work okay.
The question you should ask yourself when modifying the architecture
of a pretrained model is whether the parameters (weights, etc.) of the
modified model still mean more or less the same thing as they did in the
unmodified model.
In the case of your proposed VGG16 modification, the following points
are relevant:
-
MaxPool2d
does not contain any trainable parameters, so no
issue there. -
As for the
Conv2d
layers, changingcell_mode
fromFalse
to
True
will sometimes change slightly the size of the “images”
output by theMaxPool2d
layers. But theConv2d
layers are
pretty much agnostic about the spatial sizes of their inputs, so
their weights retain essentially their original meaning. -
At the end, you have some fully-connected
Linear
layers. These
expect inputs of a fixed size. But the fully-connected “classifier”
section of VGG16 is preceded by anAdaptiveAvgPool2d
layer
that outputs an “image” of spatial size 7 x 7, regardless of whether
yourcell_mode = True
modifications have changed the size of
the input to theAdaptiveAvgPool2d
layer.
All in all, the modified architecture will still work, and the weights of the
layers will retain their original meanings so that their original pretrained
values will still makes sense and work together properly with one another.
This remains true even though the sizes of various intermediate “images”
(sometimes called “feature maps”) will be modestly different as a result
of your cell_mode = True
modifications.
See this illustration in which layer 23 (chosen more or less at random) of
VGG16’s features
module has your cell_mode = True
modification
applied to it:
>>> import torch
>>> torch.__version__
'1.10.2'
>>> import torchvision
>>> torchvision.__version__
'0.11.3'
>>>
>>> _ = torch.manual_seed (2022)
>>>
>>> vgg = torchvision.models.vgg16 (pretrained = True)
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to C:\<path_to_.cache>\torch\hub\checkpoints\vgg16-397923af.pth
100%|███████████████████████████████████████████████████████████████████████████████| 528M/528M [00:39<00:00, 13.8MB/s]
>>>
>>> t = torch.randn (1, 3, 244, 244)
>>>
>>> vgg.features[23]
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
>>> predA = vgg (t)
>>>
>>> vgg.features[23] = torch.nn.MaxPool2d (2, 2, ceil_mode = True)
>>>
>>> vgg.features[23]
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
>>> predB = vgg (t)
>>>
>>> predA[0, 0:10]
tensor([-1.1322, 2.0340, -1.0934, -1.5980, -0.4693, 0.9064, 0.4962, 0.7050,
0.4435, 0.7932], grad_fn=<SliceBackward0>)
>>> predB[0, 0:10]
tensor([-0.2408, 2.8836, -1.5900, -1.4564, -0.6028, 2.5356, 0.9514, 0.7236,
1.3219, 0.4366], grad_fn=<SliceBackward0>)
>>>
>>> vgg.avgpool
AdaptiveAvgPool2d(output_size=(7, 7))
Notice that the resulting prediction (for a random input) does change
when the network is modified, but that it still retains much of the structure
of the prediction made by the original network.
Best.
K. Frank