Pytorch Dataset why so slow

I extended DataSet, class CyDataSet(Dataset).
basically implements reading data from files.
in init i loaded all files to memory,
in getitem, i translated the index to my-cached df, then directly index into the df and return.
def getitem(self, index):
ssidx = bisect.bisect_right(self.cumulative_sums, index)
ss = self.sslist[ssidx]
space_start_index = self.cumulative_sums[ssidx - 1] if ssidx > 0 else 0
relative_position = index - space_start_index
features_tensors, target_tensors = self.pdfCache[file_key]
X = features_tensors[relative_position]
Y = target_tensors[relative_position]
symbol_id = self.symbolTensorMap[ss.symbol]
return X, symbol_id, Y
Just this simple, for batch size=2048, for 1 batch, getitem take around 1.2s, traning and backpro take 1s on CPU. why getitem so slow? any other framework code involved, why no batch_getitem?

You could profile your code to narrow down where the bottleneck is as it’s unclear to me if the getitem call itself is slow or any of the calls you are using inside.