Pybind ddp pickle

Hello everyone,

A c++ preprocessor class is implemented with pybind11. The c++ class is imported in a custominzed dataset(torch.utils.data.Dataset) and one of its funtion is called by getitem. Once I tried to use DDP for multi-process training, the errror “cannot pickle preprocessor object” occurs. It works correctly, if i use one gpu without DDP(num_worker>2, num_batch>2).

It seems ddp need to pickle everything contained by the dataset class for sharing the dataloader between different process (link).

I also checked the official pybind11 doc about pickling support: link
But there is no cues about how to pickle a general c++ classes with pointers (fill numpy arrays).

Am I using ddp wrongly? or is there any solution can solve it?

Thanks,
Lin

DDP by itself does not do any pickling or directly interact with Datset or even multiprocessing communication techniques.

This would fall into the domain of dataloader or multiprocessing cc @VitalyFedyunin

Hard to say without seeing reproducible example. But I would guess that you have custom pybind objects inside of DataSet structures. Try defining __setstate__ and __getstate__ function for them. Also in most cases (if it is not shared memory), passing memory pointers between processes will not work and you need to serialize data.

Thanks @VitalyFedyunin . Problem sovled by defining setstate and getstate mentioned in pybind official doc

Just define setstate/getstate for those parameters used in the class’s construct funtion.