How to push as much data as possible to the GPU?

I’m a former Theano user and when using Theano, if you wanted to use the GPU and you had enough GPU memory available, you would set your whole dataset as theano.shared() to avoid going back and forth between CPU and GPU.

With PyTorch I’m not sure if there’s an equivalent to this… Let’s say I’m using a DataLoader class to iterate over my dataset minibatches. In the examples I’ve seen, you would only call data_batch.cuda() within the training loop, which makes me think that we’re only passing the data to the GPU as we are training.

Am I wrong? What is the best practice here in order to get full advantage of the GPU?

1 Like

Hi, I’m not a pytorch expert nor a pytorch developer, but have you tried increasing the number of mini batches? Increase it until you get a resource allocator error or sth equivalent where it cannot allocate any more memory. After that decrease it until you find the sweet spot.

The equivalent is pin_memory=True, though I’m not super familiar with it

1 Like

To add to James’ comment:

Rather than placing everything on the GPU in one go, you can ensure that each CPU-GPU memory transfer is as fast as possible.