Description
While experimenting with pretrained models as visual feature extractors, I was surprised at GPU OOM for mobile-optimized models.
I profiled with simplified code and got result as below. The parameters of MNASNet and MobileNet are actually much less than ResNet’s,
but that doesn’t apply for much less GPU memory consumption. Is it reasonable or expected?
Example profiling
ResNet34Encoder: parameter#: 21,797,672
(0): mem_alloc: 1,772,397,056; max_mem_alloc: 1,842,586,624
(1): mem_alloc: 1,771,348,480; max_mem_alloc: 3,525,683,712
(2): mem_alloc: 1,772,397,056; max_mem_alloc: 3,525,683,712
(3): mem_alloc: 1,771,348,480; max_mem_alloc: 3,525,683,712
(4): mem_alloc: 1,772,397,056; max_mem_alloc: 3,525,683,712
MNASNet10Encoder: parameter#: 4,383,312
(0): mem_alloc: 2,359,172,096; max_mem_alloc: 4,131,510,784
(1): mem_alloc: 2,353,675,264; max_mem_alloc: 4,694,009,344
(2): mem_alloc: 2,354,584,576; max_mem_alloc: 4,694,009,344
(3): mem_alloc: 2,351,578,112; max_mem_alloc: 4,694,009,344
(4): mem_alloc: 2,358,516,736; max_mem_alloc: 4,694,009,344
MobileNetV2Encoder: parameter#: 3,504,872
(0): mem_alloc: 2,856,218,112; max_mem_alloc: 5,214,669,824
(1): mem_alloc: 2,857,602,560; max_mem_alloc: 5,699,356,160
(2): mem_alloc: 2,858,446,336; max_mem_alloc: 5,701,584,384
(3): mem_alloc: 2,851,966,464; max_mem_alloc: 5,701,584,384
(4): mem_alloc: 2,862,247,424; max_mem_alloc: 5,701,584,384
Example code to reproduce
import torch
import torch.nn as nn
from torchvision.models import resnet34, mnasnet1_0, mobilenet_v2
class ResNet34Encoder(nn.Module):
def __init__(self):
super(ResNet34Encoder, self).__init__()
self.feature_extractor = resnet34(pretrained=True)
def forward(self, x):
x = self.feature_extractor(x)
return x
class MNASNet10Encoder(nn.Module):
def __init__(self):
super(MNASNet10Encoder, self).__init__()
self.feature_extractor = mnasnet1_0(pretrained=True)
def forward(self, x):
x = self.feature_extractor(x)
return x
class MobileNetV2Encoder(nn.Module):
def __init__(self):
super(MobileNetV2Encoder, self).__init__()
self.feature_extractor = mobilenet_v2(pretrained=True)
def forward(self, x):
x = self.feature_extractor(x)
return x
def main():
nets = [ResNet34Encoder(), MNASNet10Encoder(), MobileNetV2Encoder()]
for net in nets:
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated()
net.to('cuda')
print(
f'{net.__class__.__name__}: parameter#: {sum(p.numel() for p in net.parameters()):,}'
)
for n in range(5):
x = torch.randn(10, 3, 512, 512)
x = x.to('cuda')
_ = net(x)
print(
f'({n}): mem_alloc: {torch.cuda.memory_allocated():,}; max_mem_alloc: {torch.cuda.max_memory_allocated():,}'
)
net.to('cpu')
if __name__ == "__main__":
main()