I have a few TB of tiny images and I would like to scan/classify them with a model I have trained. Currently, I load a bunch to memory, create a DataLoader object, run them through the model, and move to the next bunch. By using the DataLoader I get to use the same image transformation I used for the model. However, this is a little painful. Is there a better way of running a model over millions of files?
You could use a
Dataset to lazily load all images.
Have a look at this tutorial.
Basically you can pass the image paths to your
Dataset and just load and transform the samples in
That was easy, I got it to work. I keep forgetting how flexible PyTorch is compared to other frameworks. Thanks @ptrblck.