How to manage(free and malloc) tensor's memory by ourself

Hello, I want to manage the training by myself, including layer by layer training and management of intermediate tensors (feature map).

I try to use pytorch to enable large model training by swapping some tensor to CPU and swap them back when needed. However, I find it is hard to access the feature map.
By How to split backward process wrt each layer of neural network?, I can access each layer’s forward and backward. But I find I can’t release the memory of tensors, can anyone help me about this?
Thanks a lot!

Your use case sounds similar to e.g. DeepSpeed and you could check their allocator implementation, which also moves data around to free device memory.