CPU and GPU memory

Hi everyone.

I’m working with Pytorch 3D U-Net on the organ segmentation project. There are somehow silly questions that popped up in my mind when I was considering ways to optimize the CPU and GPU memory:

When we start training the model (already in GPU memory), our input will be transferred from the CPU RAM to GPU memory right? Then, will all the intermediate outputs (outputs after each Convolution-Activation-Pooling layer) be stored in GPU after forward propagation? If so, can we store those outputs back in CPU memory in order to save the GPU memory? Because I think, if we feed the model with too large an input (let’s say, HD or 2K image) and the intermediate outputs accumulate in the GPU, it can cause overloaded problems. So, my question is, is it possible for us to access into each layer of a U-Net model to change the memory location like this? At this time I don’t mind the model’s performance, just want to know whether it optimizes the GPU or not.

Thanks for your help.


Yes, the outputs will use the same device as the inputs of an operation by default.

Also yes. You can use CPU offloading as described here.

Thanks for your reply!

I have one more question. For backward propagation, the intermediate outputs will be copied back to GPU. For the U-Net architecture, does the backward propagation calculate the gradient for each layer, including the (forward) input, intermediate outputs, and final output? If so, the idea of saving outputs back to CPU sounds not reasonable, because the GPU still needs to handle the forward outputs in backpropagation…

During the backward pass the offloaded activations will be moved back to the GPU and the gradient calculation will be performed as it would be done in the original model without offloading.