How to prefetch data when processing with GPU?

we already have prefetch (see the imagenet or dcgan examples), but we dont prefetch directly onto the GPU. We prefetch onto CPU, do data augmentation and then we put the mini-batch in CUDA pinned memory (on CPU) so that GPU transfer is very fast. Then we give data to network to transfer to GPU and train.

21 Likes