why does the cifar10 tutorial move each batch to GPU each iteration instead of keeping all the data set in gpu from the start? Isn’t cifar10 small enough to be able to do that? Or is it just so that that example is generalizable to ImageNet?
http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html