How to send all data into gpu before training

I want to know why training parts are usually like this:

for i, (image, labels) in enumerate (train_loader):
images = images.to(device)
labels = labels.to(device)
‘’‘train part’’’

which transfers batch data from cpu to gpu in the train loop. Maybe it will consume much time especially when net scale and input data is small. Can be send all data into gpu before training? How to do this?

Thanks.

Sure, just push the data to your GPU in your Dataset and use num_workers=0, since as far as I know multiple workers would otherwise try to initialize cuda multiple times which will yield an error.
If you are lazily loading your data in __getitem__ you would need to change this behavior and load all your data in __init__ of your Dataset or beforehand.