Hello everyone, I have a validation dataset of approximately 1800 images, and I’ve built this dataset based on the CocoDataset. I’ve loaded this dataset into a PyTorch DataLoader using the following code.
You could profile the data loading pipeline and could try to narrow down where the bottleneck is. E.g. the actual loading from drive could be slow, especially if you are using a spinning HDD, or the data processing if your CPU is not fast enough.
How can data loading be profiled? I have tried this method, but I still can’t see the results or get past the first iteration.
from torch.profiler import profile, record_function
with profile(record_shapes=True, profile_memory=True, use_cuda=True,) as prof:
for obj in data_loader: