Let’s say I have a large image that is the bottleneck of image loading process. Normally you set up your iterator and in getitem you specify how you get your images. I want to load this image once, but get 10 random crops. So I essentially want to do some yield loop inside the getitem function, but with pytorch so I can utilize batch sizes > 1 and use parallelism. How can I do this?
I don’t think you can do that inside the
__getitem__ function. But, if you create a list like
self.data, then you can append multiple copies of this image to the list in the
__init__. Then, in
__getitem__ you retreieve items from the list
You can add a cache to your dataset class. However, do note that if you are using multiprocessing dataloading (i.e., num_workers > 0), each worker process will get an independent cache, unless you do some interprocess connection like shared memory objects or a local db.