Hi,
I have a Dataset class to which I pass in a Pandas df. My __getitem__
method looks like below.
> def __getitem__(self, index):
> x = self.df.iloc[index]['column_1']
> a, b = self.some_function(x)
> label = self.df.iloc[index]['label']
> return a, b, label
When I pass the Dataset object to a DataLoader and generate a batch, with batchsize 5 for example, does the DataLoader generate a batch by looping through a list of 5 indices and get one data point at a time from getitem
? Ideally, since I’m passing a dataframe into my Dataset class, it would be quicker if index
was a list like [0,1,2,3,4] instead of passing it as individual indices.
I ask this because right now I’m bottlenecked at the CPU with the DataLoader. Any suggestions on how I could modify the code to subset my df
into batches without looping over indices would be greatly welcome!
Thank you.