Is a Dataset copied as part of Dataloader with multiple workers?

:question: Does the Dataloader copy the Dataset on each worker?

  • The documentation doesn’t use plain English. It does mention details of multiprocessing that I dunno :sweat:

  • I assume it copies as I have done hacks to avoid unpickable objects e.g. using HDF5

I gotta implement a quick & dirty hack to improve the speed of one dataset.

Yes, each worker would create a copy of the Dataset.

2 Likes

Thanks!

Is there any (blog) post discussing how to finding bottlenecks and improving the speed of a Dataset?

  • If so, please comment.

  • If you would be interested in reading one, :raised_back_of_hand:

  • If you are interested in writing one, react with :smirk:

You could take a look at this post, which explains data loading bottlenecks and some potential workarounds.

1 Like

Awesome. Added to my bookmarks!