Tutorial on custom dataloaders (NOT datasets)?

Hi,

I really appreciate this tutorial on custom datasets. However, the torch.utils.data.DataLoader class is only briefly mentioned in it:

However, we are losing a lot of features by using a simple for loop to iterate over the data. In particular, we are missing out on:

  • Batching the data
  • Shuffling the data
  • Load the data in parallel using multiprocessing workers.

torch.utils.data.DataLoader is an iterator which provides all these features. Parameters used below should be clear. One parameter of interest is collate_fn . You can specify how exactly the samples need to be batched using collate_fn . However, default collate should work fine for most use cases.

Could we possibly get a tutorial on custom dataloaders using the torch.utils.data.DataLoader class? More specifically, how to interface with its parameters, especially the num_workers and collate_fn parameters. Also, an explanation on how to inherit from the abstract base class and a template for the collate_fn would be nice too.

I am aware of this issue and this issue but neither have led to a tutorial.

Again, I really appreciate the effort that goes into the tutorials that are currently available, but I feel that a tutorial on custom dataloaders would answer a lot of questions.

That makes sense and might be beneficial for advanced use cases.
Would you be interested in creating such a tutorial, which is digging into the DataLoader internals? :slight_smile:

Hi @ptrblck!

Thanks for the response.

I would love to make a tutorial, but I am not sure where to start exactly. Any suggestions? Also, are there any prerequisite concepts that I need to know about?

I would suggest to create a feature request here with some information from this topic, such as what is currently not explained well and your suggestions how to improve it. After a discussion you should be good to go. If you need some help or guidance, feel free to ping me in the created issue.

1 Like

Hi @ptrblck,

I created the issue here but have not yet received a reply. Any suggestions on what to do next?

Thanks! I started the discussion in the issue.

1 Like