Hey,
It may be a trivial question but I could not find a clear answer online.
I’m working with datasets of large images (loading each image takes a second) but the whole dataset can hold in memory, so I have decided to store the images in RAM instead of reading them on the fly.
I have my custom dataset class and so far I’m storing the raw images in a Manager.dict(), and in get_item I do augmentations/processing in numpy on these images before returning a tensor. So my questions are:
- Can I use a simple python dictionary to store these images ? (meaning is it read/write safe when using it through a dataloader with multiple workers ?)
- Same question for a large np array (which would contain all images) ?
- Do you think it’s faster to load all images once at instanciation of the dataset, or to load and store them in RAM during the first call to get_item for each sample ?
Thank you in advance,