Pass training set through cuda network without overloading memory

I have a neural network model that I have mapped to a GPU device (hence the CUDA network description in the topic title) and during training I wish to put the entire training set through the network. (For exploratory purposes, not for training)

If I was in evaluation mode for a test set I would map the saved (trained) model to a CPU device and pass the test set in as one giant batch.

However, if I try to do this during training with the CUDA network I overload the gpu memory. Is there a good way to temporarily stop the model relying on GPU memory within training just to put the training set through (without recording gradients)?

My current ideas are:

  • find a way to temporarily map the network to the CPU without saving and reloading the model
  • put the training data through one example at a time (this is very slow and goes against the point of the original problem)

Any help on this would be really appreciated! Apologies for not providing any code, I feel like it’s a problem that maybe has a generalized solution anyway.

I assume “overloading” means your GPU is running our of memory?
If so, you could try to wrap this special forward pass in a with torch.no_grad() block so avoid storing the intermediate activations (which are not needed, if you don’t plan on calling backward).
If that doesn’t help and you are still running out of memory, you could of course chunk the data in smaller batches, which should still be faster than passing each sample one by one.

You don’t have to save and reload the model in order to push it to the CPU.
Just call model.to('cpu') to move all parameters to the CPU.

In the end I used the model.to('cpu') option and that worked for me, many thanks again! :slight_smile: