Hi, I am trying to maximize inference throughput on a GPU for my undergrad thesis analysis. On TensorFlow, I managed to “overlap” data loading and GPU computation. However, I can’t seem to do this on PyTorch.
Essential code:
my_net = torch.load(some_file)
my_net.cuda()
class MyDataset(Dataset):
def __len__(self):
return 100000
def __getitem__(self,idx):
image = np.random.randn(3,224,224).astype(np.float32)
return torch.from_numpy(image)
def run_inference(imgs):
batch = Variable(imgs,volatile=True)
r = my_net.forward(batch.cuda(async=True))
r.cpu()
my_dataset = MyDataset()
dataset_loader = torch.utils.data.DataLoader(my_dataset,
batch_size=FLAGS.batch_size,
shuffle=False,
num_workers=8,
pin_memory=True)
dataset_iter = iter(dataset_loader)
for i, data in enumerate(dataset_loader):
run_inference(data)
For instance, I’m trying alexnet inference with batch size of 128 and I am getting 70ms per batch, out of which ~13ms is moving data from host to GPU. Would it be possible to avoid this by overlapping data copying and GPU computations? How?
Thanks!