I want to know why training parts are usually like this:
for i, (image, labels) in enumerate (train_loader):
images = images.to(device)
labels = labels.to(device)
which transfers batch data from cpu to gpu in the train loop. Maybe it will consume much time especially when net scale and input data is small. Can be send all data into gpu before training? How to do this?