I’m trying to deploy multiple models on one GPU in a single service. When I add one model on a GPU, the GPU memory consumption is 819M; when I add another model on the same GPU, the GPU memory consumption increases just 84M, to 903M. Is it a normal phenomenon? Why?
Below is the demo code, I use Pytorch1.0, Ubuntu 18.04.
import PIL
import time
import torch
import torchvision.models as models
import torchvision.transforms as transforms
class Classifier(object):
def __init__(self, gpuid):
self.device = torch.device("cuda:{}".format(gpuid))
self.model = models.densenet201(pretrained=True)
self.model.to(self.device)
self.model.eval()
### forward a fake sample to allocate GPU memory
fake = torch.zeros((1, 3, 224, 224), dtype=torch.float32)
self.model(fake.to(self.device)).cpu()
c1 = Classifier(0) # add one classifier, GPU memory: 819M
c2 = Classifier(0) # add another classifier, GPU memory: 903M
time.sleep(100)