If you ever trained a model, you sometimes can see DataLoader (e.g. I am using pytorch Dataloader) could be the bottleneck because everytime you get an training sample from the dataset, the data transformation will be performed on-the-fly. Here I take the getitem func of DatasetFolder from torchvision.datasets as an example.
path, target = self.samples[index]
sample = self.loader(path)
if self.transform is not None:
sample = self.transform(sample)
if self.target_transform is not None:
target = self.target_transform(target)
return sample, target
I wonder can we pre-processed the images (e.g. ImageNet) in advance to tensors and save to disk. Then we modify the __getitem__ function to get these tensors directly from disk. How efficient is this approach? Anyone has tried this solution before?
I think that maybe the loading from disk will burden and likely become a new bottleneck (instead of data transform we have before). Another thing is the size, for example, one ImageNet image takes 74 MB when being saved as tensors using standard transformation:
I think you are spot-on here. In my experience, decoding image files often is much faster than than the time it takes to load uncompressed images from disk.
That said, you might loom into pre-resizing to a comon (maybe a bit larger than the final) size, save as images, and just do ToTensor in the dataset and anything else on the GPU. That can give great speedups.