Tutorial on custom dataloaders (NOT datasets)?

Mahmoud_Abdelkhalek · June 1, 2020, 4:15pm

Hi,

I really appreciate this tutorial on custom datasets. However, the torch.utils.data.DataLoader class is only briefly mentioned in it:

However, we are losing a lot of features by using a simple for loop to iterate over the data. In particular, we are missing out on:

Batching the data

Shuffling the data

Load the data in parallel using multiprocessing workers.

torch.utils.data.DataLoader is an iterator which provides all these features. Parameters used below should be clear. One parameter of interest is collate_fn . You can specify how exactly the samples need to be batched using collate_fn . However, default collate should work fine for most use cases.

Could we possibly get a tutorial on custom dataloaders using the torch.utils.data.DataLoader class? More specifically, how to interface with its parameters, especially the num_workers and collate_fn parameters. Also, an explanation on how to inherit from the abstract base class and a template for the collate_fn would be nice too.

I am aware of this issue and this issue but neither have led to a tutorial.

Again, I really appreciate the effort that goes into the tutorials that are currently available, but I feel that a tutorial on custom dataloaders would answer a lot of questions.

ptrblck · June 2, 2020, 7:10am

That makes sense and might be beneficial for advanced use cases.
Would you be interested in creating such a tutorial, which is digging into the DataLoader internals?

Mahmoud_Abdelkhalek · June 2, 2020, 1:09pm

Hi @ptrblck!

Thanks for the response.

I would love to make a tutorial, but I am not sure where to start exactly. Any suggestions? Also, are there any prerequisite concepts that I need to know about?

ptrblck · June 3, 2020, 5:05am

I would suggest to create a feature request here with some information from this topic, such as what is currently not explained well and your suggestions how to improve it. After a discussion you should be good to go. If you need some help or guidance, feel free to ping me in the created issue.

Mahmoud_Abdelkhalek · June 8, 2020, 1:28pm

Hi @ptrblck,

I created the issue here but have not yet received a reply. Any suggestions on what to do next?

ptrblck · June 9, 2020, 3:58am

Thanks! I started the discussion in the issue.