Strange GPU memory consumption in case of multiple models on one GPU

Chen_Li · December 26, 2019, 8:11am

I’m trying to deploy multiple models on one GPU in a single service. When I add one model on a GPU, the GPU memory consumption is 819M; when I add another model on the same GPU, the GPU memory consumption increases just 84M, to 903M. Is it a normal phenomenon? Why?

Below is the demo code, I use Pytorch1.0, Ubuntu 18.04.

import PIL
import time
import torch
import torchvision.models as models
import torchvision.transforms as transforms


class Classifier(object):
    def __init__(self, gpuid):
        self.device = torch.device("cuda:{}".format(gpuid))
        self.model = models.densenet201(pretrained=True)
        self.model.to(self.device)
        self.model.eval()

        ### forward a fake sample to allocate GPU memory
        fake = torch.zeros((1, 3, 224, 224), dtype=torch.float32)
        self.model(fake.to(self.device)).cpu()


c1 = Classifier(0)  # add one classifier, GPU memory: 819M
c2 = Classifier(0)  # add another classifier, GPU memory: 903M

time.sleep(100)

ptrblck · December 26, 2019, 8:26am

Most likely the main part of the memory allocation after initializing the first model will be grabbed by the CUDA context for this device.
The second model will thus just use the necessary space for all parameters.