I am using distributeddataparallel for multi-GPU training using pytorch, and a process uses one GPU.
When I follow the normal formular in pytorch, the inference time is OK, like this.
x,y = next(train_loader)
x = x.cuda(rank)
y = y.cuda(rank)
t0 = time.time()
y1 = model(x)
torch.cuda.synchronize()
inference_time = time.time()-t0
But when I get the data from another thread, which always read data from train_loader and input it to a queue. The code is as following.
args.data_queue=queue.Queue()
def load_data_queue(rank, dataloader, args):
n = 0
while True:
try:
x,y = next(dataloader)
x = x.cuda(rank)
y = y.cuda(rank)
args.data_queue.put([feature, label])
except StopIteration:
print('load queue quits normally')
return
...
t = threading.Thread(target=load_data_queue, args=(
rank, train_loader, args), daemon=True)
t.start()
...
x,y = args.data_queue.get()
t0 = time.time()
y1 = model(x)
#torch.cuda.synchronize()
inference_time = time.time()-t0
The inference_time will increase a lot.
To my understanding, GPU i/o should not influence GPU compute. What is causing this phenomenon?