Hi,
I really appreciate this tutorial on custom datasets. However, the torch.utils.data.DataLoader
class is only briefly mentioned in it:
However, we are losing a lot of features by using a simple
for
loop to iterate over the data. In particular, we are missing out on:
- Batching the data
- Shuffling the data
- Load the data in parallel using
multiprocessing
workers.
torch.utils.data.DataLoader
is an iterator which provides all these features. Parameters used below should be clear. One parameter of interest iscollate_fn
. You can specify how exactly the samples need to be batched usingcollate_fn
. However, default collate should work fine for most use cases.
Could we possibly get a tutorial on custom dataloaders using the torch.utils.data.DataLoader
class? More specifically, how to interface with its parameters, especially the num_workers
and collate_fn
parameters. Also, an explanation on how to inherit from the abstract base class and a template for the collate_fn
would be nice too.
I am aware of this issue and this issue but neither have led to a tutorial.
Again, I really appreciate the effort that goes into the tutorials that are currently available, but I feel that a tutorial on custom dataloaders would answer a lot of questions.