How to make efficient Dataset for small patches from large images

ychfan · February 24, 2019, 7:13am

I am trying to write a Dataset of image patches (50x50 in HxW) from many (100k) large images (3Kx3K in HxW).
In the current implementation, __getitem__ function reads a image and randomly extract small patch. The overhead of loading and decoding makes the Dataset very slow.
Since I have too many large images, they can’t totally loaded into memory.
Is there any way to make the Dataset more efficient?
Thanks!

MariosOreo · February 24, 2019, 9:16am

Hi,

Could you crop the large images into small patches as preprocess, and then create your custom dataset on small images normally, though this approach makes the extract phase and load separately, but I think it is not complex, so you can have a try.

Best regrads