I’m not an expert but you need to consider than when you call .cpu() you are making a copy in RAM, but It doesn’t imply you are removing the gpu version.
You can delete the GPU variable to achieve that. The memory will be available but it is not overwritten until new variables use that space. So in short memory manager is aware it can use those memory addresses, but doesn’t delete them as it would require time and can just overwrite when it is necessary.
Anyway this is just what I understood from other people’s explanations.