Run the same code on a GPU. When only one process is running, the time is about 5 ms per image, and the gpu-util is about 50%. When you run the same program again, both of them are about 10ms per image, and the gpu-util is also about 50%. Why? and how to solve it?
import torch import torchvision.models as models import numpy as np import time import os os.environ["CUDA_VISIBLE_DEVICES"] = '0' resnet18 = models.resnet18() resnet18.avgpool = torch.nn.AdaptiveAvgPool2d(1) im = torch.from_numpy(np.random.rand(1,3,100,100).astype(np.float32)) resnet18 = resnet18.cuda() im = im.cuda() for i in range(10000): t1 = time.time() pred = resnet18(im) t2 = time.time() print(t2 - t1)
and you can use this code to reproduce this problem. first, run this script, and you can find about 5ms per im. and run this script in two processes on same GPU, and you can find every processes spend 5ms per im.