why does the cifar10 tutorial move each batch to GPU each iteration instead of keeping all the data set in gpu from the start? Isn’t cifar10 small enough to be able to do that? Or is it just so that that example is generalizable to ImageNet?
You usually would want to do that, since you save a lot of your valuable GPU memory.
Of course, with tiny datasets you could move all the data to the GPU, but the tutorial shows the way to go.