I have a fairly simple training script that
- Reads data from parquet into a pandas DF
- Pushes data into a torch tensor
- Uses TensorDataset/DistributedSampler/DataLoader to load data during training
- Uses DistributedDataParallel to manage distributed training across GPUs of a single instance.
However I know that when i call mp.spawn(train, nprocs=args.gpus, args=(args,))
the code to read my feature and label data is executed in each process. I’m sure this causes some sort of unnecessary memory/CPU overhead on the machine. Is there any obvious way to avoid this?
Thanks so much!
-Sohrab Andaz