Idiomatic Dataset/loader design for segmentation task

I have a data set of ~1k images, each about 60MB on disk. I want to train a UNet-like model with patches of the images, but I am unsure about the best way to construct a training Dataset to feed the model. The images are too large to fit all of them in RAM at the same time, and are too slow to load to have each training sample come from a newly-loaded image.

I am hesitant to have an entire batch come from the same image, since I’m worried that the similarities between samples of the same image will have a negative impact on training.

What’s the best way to design a Dataset in this situation?

I’m currently thinking that each Dataset will have a short list of images stored in RAM, will pull a random patch from the Nth one, with a low percent chance to replace the image on access. Amortized out, that should be quick, but seems over-complicated, hence asking here.