How to push as much data as possible to the GPU?

miguelvr · April 12, 2017, 10:02am

I’m a former Theano user and when using Theano, if you wanted to use the GPU and you had enough GPU memory available, you would set your whole dataset as theano.shared() to avoid going back and forth between CPU and GPU.

With PyTorch I’m not sure if there’s an equivalent to this… Let’s say I’m using a DataLoader class to iterate over my dataset minibatches. In the examples I’ve seen, you would only call data_batch.cuda() within the training loop, which makes me think that we’re only passing the data to the GPU as we are training.

Am I wrong? What is the best practice here in order to get full advantage of the GPU?

kirk86 · April 12, 2017, 12:13pm

Hi, I’m not a pytorch expert nor a pytorch developer, but have you tried increasing the number of mini batches? Increase it until you get a resource allocator error or sth equivalent where it cannot allocate any more memory. After that decrease it until you find the sweet spot.

jekbradbury · April 12, 2017, 6:04pm

The equivalent is pin_memory=True, though I’m not super familiar with it

Jordan_Campbell · April 12, 2017, 10:11pm

To add to James’ comment:

http://pytorch.org/docs/notes/cuda.html#use-pinned-memory-buffers

Rather than placing everything on the GPU in one go, you can ensure that each CPU-GPU memory transfer is as fast as possible.