I am using distributeddataparallel for multi-GPU training using pytorch, and a process uses one GPU.
When I follow the normal formular in pytorch, the inference time is OK, like this.
x,y = next(train_loader) x = x.cuda(rank) y = y.cuda(rank) t0 = time.time() y1 = model(x) torch.cuda.synchronize() inference_time = time.time()-t0
But when I get the data from another thread, which always read data from train_loader and input it to a queue. The code is as following.
args.data_queue=queue.Queue() def load_data_queue(rank, dataloader, args): n = 0 while True: try: x,y = next(dataloader) x = x.cuda(rank) y = y.cuda(rank) args.data_queue.put([feature, label]) except StopIteration: print('load queue quits normally') return ... t = threading.Thread(target=load_data_queue, args=( rank, train_loader, args), daemon=True) t.start() ... x,y = args.data_queue.get() t0 = time.time() y1 = model(x) #torch.cuda.synchronize() inference_time = time.time()-t0
The inference_time will increase a lot.
To my understanding, GPU i/o should not influence GPU compute. What is causing this phenomenon？