I have a Dataset class to which I pass in a Pandas df. My
__getitem__ method looks like below.
> def __getitem__(self, index): > x = self.df.iloc[index]['column_1'] > a, b = self.some_function(x) > label = self.df.iloc[index]['label'] > return a, b, label
When I pass the Dataset object to a DataLoader and generate a batch, with batchsize 5 for example, does the DataLoader generate a batch by looping through a list of 5 indices and get one data point at a time from
getitem? Ideally, since I’m passing a dataframe into my Dataset class, it would be quicker if
index was a list like [0,1,2,3,4] instead of passing it as individual indices.
I ask this because right now I’m bottlenecked at the CPU with the DataLoader. Any suggestions on how I could modify the code to subset my
df into batches without looping over indices would be greatly welcome!