How is DataLoader's dataset argument accessed by worker processes?

I’d like to ask how the “dataset” argument of is accessed by worker processes during multi-process data loading.

My dataset object contains a database handle that can’t be shared across multiple processes. Single-process data loading works as intended. During multi-process data loading, all processes erroneously attempt to use the same database connection handle.

I’ve implemented custom copying and pickling that create dedicated database connections for each object. This works as intended when I deepcopy or pickle the object myself. However, it doesn’t seem to solve this problem.

How do I open dedicated database connections as my dataset object is sent to, or accessed by worker processes?
Thanks for any pointers.


I’ve found a hacky solution by using to detect whether the current dataset object is inside a worker process, which triggers a one-time close+reopen of the database connection. This results in each loading process using a dedicated database connection as intended.

I’d much rather do this when the dataset object is handed over to the worker process, but still haven’t had any luck implementing that.