Only one GPU is working while using nn.DataParallel

I am working on a 2 Titan machine and usingtorchvision.models.resnet50 for feature extracting.
para_model=torch.nn.DataParallel(pretrained_model) is used and I expect both of my gpu work together. But when check withnvidia-smi I found only gpu0 is working full load while gpu1’s memory usage remains none. However, when I test in python shell( also with same pretrained network it show that both graphic card are working well.
I am wondering if there should be some options other than just call DataParallel.

UPDATE:
I made simplified test script(only pretrained model which processes random cuda tensors) and find it works as expected. So I guess it might be an issue with multiprocessing, as I start a process which loads data and passes numpy arrays to main process via Queue. Anyway to prove this assumption?

@Varg_Nord you can check this assumption by starting the DataLoader with workers=0 (so that it does not use multiprocessing)