Some gpus take 100% gpu utilization; while some 0

The training task is image classification on imagenet 22k dataset. The images are organized as one file which contains 1.4TB data. Initially, the process needs to download the whole data after init_process_group and before model forward/backward. The download normally takes 3 hours+. Actually, it is using microsoft azure blobfuse. In this case, i find some GPU in some nodes takes 100% utilization; while some nodes takes 0% GPU utilization. The training also never proceeds. Pytorch version is 1.6. Are there any potential issues in pytorch under this special situation?