Fetching data for the next epoch

Hi. I would need some explanation concerning the behavior of the Dataloader.
This is my setup: I have a batch size of 96, and given my data set, I need 27 iterations for 1 epoch to be performed. So having many cores to my disposal, I set num_workers to 27 (I know that it is not particularly optimized to have so many workers, but it’s for the sake of the demonstration).

I can then see through prints, that all 27 processes are fetching images, until they reach 96. At that time, the data are then copied to the GPU with .to(), and processes by the model, which takes time. But while the GPU is working at 90%, all my CPU cores are idle, and not doing anything, until the epoch is complete. I would have expected them to load the data for the next epoch, while the GPU makes the calculations, but they are not. I tried messing around with the prefetch_factor, persistent_workers or pin_memory, but from what I understand, it is not really related.

Could someone tell me what is the limitation preventing from loading the next batches for the next epoch, while the previous epoch is not over yet? As far as I understand, the loading process from Dataset.get_item() happens on the CPU RAM, and should not conflict? Thanks!