Analyzing the training of my model with the PyTorch profiler, I noticed that most of the time is spent by the CPU (unfortunately, the trace function does not show me any results, the tensorboard page remains blank).
I am more or less sure that the overhead is due to data loading. The dataset consists of two folders, namely one containing annotation .json files and the other jpg images. Considering that the annotations can be entirely loaded into RAM, while all the images cannot, how can the data loading be made more efficient?
self.loader = torchvision.io.read_image
def __getitem__(self, index: int) -> Union[torch.Tensor, torch.Tensor, torch.Tensor]:
actual_index = self.indices[index]
annotation = self.annotations[actual_index]
img_path = os.path.join(self.images_path, annotation["filename"])
img = self.loader(img_path.replace('.json', ".jpg"))
bboxes = annotation["bboxes"]
mask = annotation["mask"]
return img, bboxes, mask
I tried varying the batch size, but the gpu utilization remains almost the same (low).
i have tried using multiple persistent workers, but the wait time between epochs is quite long.
I wonder if it is possible to pre-fetch images from files during traning, so while the GPU is busy. Or possibly, instead of using the getitem function, a method to which you can pass all batch indexes to load them together. In general, if there are more efficient methodologies.