I am using DataParallel model and multithreading for loading the input and ground truth. I am using python queue for synchronizing between the main thread and the loader threads. The code looks like the following pseudocode.
import queue
Q_dirs = queue.Queue()
Q_tensors = queue.Queue( maxsize=4 )
model = th.nn.DataParallel(model, device_ids=cuda_devices, output_device=cuda_devices[0] ).cuda()
def loader():
global Q_dirs
global Q_tensors
while True:
list_dirs = Q_dirs.get()
tensor_data = {}
# for dir in list_dirs read tensor and load in tensor_data.
# tensor_data['features'] = features
# tensor_data['q_gt'] = q_gt
Q_tensors.put(tensor_data)
# in main thread
for i in range(no_of_batches):
for j in range( current_batch_size ):
list_dirs.append ( dirs[ i*bs + j ] )
Q_dirs.put(list_dirs)
for i in bar(range(no_of_batches)):
tensor_data = Q_tensors.get()
features = tensor_data['features']
q_gt = tensor_data['q_gt']
input = Variable( features, requires_grad=False )
q_gt = Variable( q_gt, requires_grad=False)
q_pred = model(input)
loaders = []
for i in range(4):
loaders.append( T.Thread(name='L'+str(i), target=loader) )
for l in loaders:
l.start()
But the Dataparallel model hangs in the forward pass of the model. At the point when the model hangs, I could see only one of the GPU cards (the card that I set as output_device
when I create the DataParallel) has 100% utilization and all other cards have 0% utilization. This happens at variuos epochs and not consistant always. I suspect that there is wrong race condiiton in my code. But given that I am using only thread-safe python 3.5 Queues for synchrnizatoin, I am puzzled about the issue. Could someone throw some light on what is going on here?
Some additional information about my system.
OS: Ubuntu 16.04.2 LTS
python: 3.5
CUDA: 8.0
PyTorch version: 0.1.11_4