Does DataLoader iterate through indexes to generate a batch?


I have a Dataset class to which I pass in a Pandas df. My __getitem__ method looks like below.

>   def __getitem__(self, index):
>         x = self.df.iloc[index]['column_1']
>         a, b = self.some_function(x)    
>         label = self.df.iloc[index]['label']
> return a, b, label

When I pass the Dataset object to a DataLoader and generate a batch, with batchsize 5 for example, does the DataLoader generate a batch by looping through a list of 5 indices and get one data point at a time from getitem? Ideally, since I’m passing a dataframe into my Dataset class, it would be quicker if index was a list like [0,1,2,3,4] instead of passing it as individual indices.

I ask this because right now I’m bottlenecked at the CPU with the DataLoader. Any suggestions on how I could modify the code to subset my df into batches without looping over indices would be greatly welcome!

Thank you.

1 Like

Have a look at this code to see how to provide a list of indices to your Dataset.

Thank you very much!

cool, thank you! i was looking for that as well