I use distributed training mode to train my model.
My dataset is not large so I can load it to memory such that I do not need to read them from disk and decode them.
I found this example: pytorch_misc/shared_array.py at master · ptrblck/pytorch_misc · GitHub
It is inspiring but it only supports single gpu training rather than ddp. Besides, this only supports the senario where all input images have identical sizes. What if my dataset contains images of various sizes?
Maybe check the
shared_dict example to add the tensors into a
dict which would support various shapes.
Thanks for replying!!! Does this support sharing amoung different gpu? I mean I am not only using multi-worker for dataloader but also using distributed training mode. Can I simply keeping only one piece of copy in the memory for all processes from dataloader and gpus?
I don’t know as I haven’t tried this use case, but usually you would use a
DistributedSampler in a DDP setting which would make sure that each process loads only the corresponding chunk of the data used on the corresponding device.