So I’m using a script to turn a directory of images in 5 subdirectories into a single tensor of size (730, 3, 256, 256) and a label tensor of size (730, 5) for 5 classes and then torch.utils.data to turn that into a TensorDataset and make/shuffle batches. The batches are then moved to the GPU individually at each iteration through the dataset during training.
However, this isn’t a tenable practice for a very large dataset. Is there a better way to do this that I’m not seeing in the docs? It seems like there should be a simpler way to read images from disk into shuffled batches rather than having to put the whole thing into two tensors in system memory.