High GPU memory usage for resnet models in pytorch 0.1.11 compared to 0.1.10

han · April 27, 2017, 6:38pm

Hi,
Two issues are found for pytorch 0.1.11 when I run the following Encoder CNN.

if param.requires_grad = False, pytorch 0.1.10 takes only a small amount of GPU compared to 0.1.11. Pytorch 0.1.11 takes almost same amount of GPU memory no matter the requires_grad value.
Even if param.requires_grad = True, pytorch 0.1.10 takes about half of GPU usage compared to 0.1.11. When I dig into the code , I find the second batch forward doubles the GPU usage in 0.1.11, but only negligible memory usage increases in the second batch in 0.1.10

class EncoderCNN(nn.Module):
def init(self, embed_size):
""“Load the pretrained ResNet-152 and replace top fc layer.”""
super(EncoderCNN, self).init()
self.resnet = models.resnet152(pretrained=True)
for param in self.resnet.parameters():
param.requires_grad = False
# param.requires_grad = True
self.resnet.fc = nn.Linear(self.resnet.fc.in_features, embed_size)
self.bn = nn.BatchNorm1d(embed_size, momentum=0.01)
self.init_weights()
```
 def init_weights(self):
     """Initialize the weights."""
     self.resnet.fc.weight.data.normal_(0.0, 0.02)
     self.resnet.fc.bias.data.fill_(0)
     
 def forward(self, images):
     """Extract the image feature vectors."""
     features = self.resnet(images)
     features = self.bn(features)
     return features
```

han · April 27, 2017, 6:45pm

@apaszke
I also post the problem here with some background information

apaszke · April 27, 2017, 8:47pm

We’re aware of the issue and it’s fixed in one of the open PRs.

han · April 27, 2017, 9:16pm

Could you send me the link of PR? Thanks

fmassa · April 27, 2017, 10:29pm

han · April 27, 2017, 11:35pm

Thanks very much for the help

han · May 3, 2017, 7:12pm

It seems that the Pytorch 0.1.12 has the same issue. @apaszke @fmassa

apaszke · May 3, 2017, 7:25pm

Yes, the fix is only in master and will be part of the next release.

han · May 3, 2017, 7:37pm

Got it. Thanks a lot…