High GPU memory usage for resnet models in pytorch 0.1.11 compared to 0.1.10

Two issues are found for pytorch 0.1.11 when I run the following Encoder CNN.

  1. if param.requires_grad = False, pytorch 0.1.10 takes only a small amount of GPU compared to 0.1.11. Pytorch 0.1.11 takes almost same amount of GPU memory no matter the requires_grad value.

  2. Even if param.requires_grad = True, pytorch 0.1.10 takes about half of GPU usage compared to 0.1.11. When I dig into the code , I find the second batch forward doubles the GPU usage in 0.1.11, but only negligible memory usage increases in the second batch in 0.1.10

    class EncoderCNN(nn.Module):
    def init(self, embed_size):
    ""“Load the pretrained ResNet-152 and replace top fc layer.”""
    super(EncoderCNN, self).init()
    self.resnet = models.resnet152(pretrained=True)
    for param in self.resnet.parameters():
    param.requires_grad = False
    # param.requires_grad = True
    self.resnet.fc = nn.Linear(self.resnet.fc.in_features, embed_size)
    self.bn = nn.BatchNorm1d(embed_size, momentum=0.01)

     def init_weights(self):
         """Initialize the weights."""
         self.resnet.fc.weight.data.normal_(0.0, 0.02)
     def forward(self, images):
         """Extract the image feature vectors."""
         features = self.resnet(images)
         features = self.bn(features)
         return features

I also post the problem here with some background information

We’re aware of the issue and it’s fixed in one of the open PRs.

Could you send me the link of PR? Thanks

This is the PR https://github.com/pytorch/pytorch/pull/1016

Thanks very much for the help

It seems that the Pytorch 0.1.12 has the same issue. @apaszke @fmassa

Yes, the fix is only in master and will be part of the next release.

Got it. Thanks a lot…