Modify a MaxPool2d layer parameter in pretrained VGG16

I wanted to change the parameter in a Maxpool2d layer of a VGG16 network from ceil_mode=False to ceil_mode=True to achieve a certain strict output shape requirement to reproduce a piece of code. Can this be done if I am using a pretrained vgg16 model from torchvision.models?

Hi Em!

Yes, this should work okay.

The question you should ask yourself when modifying the architecture
of a pretrained model is whether the parameters (weights, etc.) of the
modified model still mean more or less the same thing as they did in the
unmodified model.

In the case of your proposed VGG16 modification, the following points
are relevant:

  1. MaxPool2d does not contain any trainable parameters, so no
    issue there.

  2. As for the Conv2d layers, changing cell_mode from False to
    True will sometimes change slightly the size of the “images”
    output by the MaxPool2d layers. But the Conv2d layers are
    pretty much agnostic about the spatial sizes of their inputs, so
    their weights retain essentially their original meaning.

  3. At the end, you have some fully-connected Linear layers. These
    expect inputs of a fixed size. But the fully-connected “classifier”
    section of VGG16 is preceded by an AdaptiveAvgPool2d layer
    that outputs an “image” of spatial size 7 x 7, regardless of whether
    your cell_mode = True modifications have changed the size of
    the input to the AdaptiveAvgPool2d layer.

All in all, the modified architecture will still work, and the weights of the
layers will retain their original meanings so that their original pretrained
values will still makes sense and work together properly with one another.
This remains true even though the sizes of various intermediate “images”
(sometimes called “feature maps”) will be modestly different as a result
of your cell_mode = True modifications.

See this illustration in which layer 23 (chosen more or less at random) of
VGG16’s features module has your cell_mode = True modification
applied to it:

>>> import torch
>>> torch.__version__
>>> import torchvision
>>> torchvision.__version__
>>> _ = torch.manual_seed (2022)
>>> vgg = torchvision.models.vgg16 (pretrained = True)
Downloading: "" to C:\<path_to_.cache>\torch\hub\checkpoints\vgg16-397923af.pth
100%|███████████████████████████████████████████████████████████████████████████████| 528M/528M [00:39<00:00, 13.8MB/s]
>>> t = torch.randn (1, 3, 244, 244)
>>> vgg.features[23]
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
>>> predA = vgg (t)
>>> vgg.features[23] = torch.nn.MaxPool2d (2, 2, ceil_mode = True)
>>> vgg.features[23]
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
>>> predB = vgg (t)
>>> predA[0, 0:10]
tensor([-1.1322,  2.0340, -1.0934, -1.5980, -0.4693,  0.9064,  0.4962,  0.7050,
         0.4435,  0.7932], grad_fn=<SliceBackward0>)
>>> predB[0, 0:10]
tensor([-0.2408,  2.8836, -1.5900, -1.4564, -0.6028,  2.5356,  0.9514,  0.7236,
         1.3219,  0.4366], grad_fn=<SliceBackward0>)
>>> vgg.avgpool
AdaptiveAvgPool2d(output_size=(7, 7))

Notice that the resulting prediction (for a random input) does change
when the network is modified, but that it still retains much of the structure
of the prediction made by the original network.


K. Frank