Hello everyone,
Suggested by @ rvarm1, I change the topic category (distribute–>dataloader) and re-post it here. Followings is the original problem:
A c++ preprocessor class is implemented with pybind11. The c++ class is imported in a custominzed dataset(torch.utils.data.Dataset) and one of its funtion is called by getitem. Once I tried to use DDP for multi-GPU training, the errror “cannot pickle preprocessor object” occurs. It works correctly, if i use one gpu without DDP(num_worker>2, num_batch>2).
It seems ddp/dataloader need to pickle everything contained by the dataset class for sharing the dataloader between different processes (link).
I also checked the official pybind11 doc about pickling support
But there is no cues about how to pickle a general c++ classes with pointers (fill numpy arrays).
Am I using ddp wrongly? or is there any solution can solve it?
Thanks,
Lin