How to free CPU RAM after `module.to(cuda_device)`?

t.g · June 28, 2018, 12:43am

I am trying to optimize memory consumption of a model and profiled it using memory_profiler. It appears to me that calling module.to(cuda_device) copies to GPU RAM, but doesn’t release memory of CPU RAM.
Is there a way to reclaim some/most of CPU RAM that was originally allocated for loading/initialization after moving my modules to GPU?

Some more info:

Line 214, uses about 2GB to initialize my model.
Line 221 or later, I no longer need this CPU RAM stuff, and I am trying to empty it (even with forced GC in line 224 didn’t help!)

Line #    Mem usage    Increment   Line Contents
================================================
   209     88.7 MiB     88.7 MiB       @profile
   210                                 def __init__(self, exp: Experiment, model=None, lr=0.0001):
   211     88.7 MiB      0.0 MiB           self.exp = exp
   212     88.7 MiB      0.0 MiB           self.start_epoch = 0
   213     88.7 MiB      0.0 MiB           if model is None:
   214   2159.7 MiB   2071.0 MiB               model = Seq2Seq(**exp.get_model_args()).to(device)
   215   2159.7 MiB      0.0 MiB               last_check_pt, last_epoch = self.exp.get_last_saved_model()
   216   2159.7 MiB      0.0 MiB               if last_check_pt:
   217                                             log.info(f"Resuming training from epoch:{self.start_epoch}, model={last_check_pt}")
   218                                             self.start_epoch = last_epoch + 1
   219                                             model.load_state_dict(torch.load(last_check_pt))
   220   2159.7 MiB      0.0 MiB           log.info(f"Moving model to device = {device}")
   221   2159.7 MiB      0.0 MiB           self.model = model.to(device=device)
   222   2159.7 MiB      0.0 MiB           self.model.train()
   223   2159.7 MiB      0.0 MiB           del model           # this was on CPU, free that memory
   224   2159.7 MiB      0.0 MiB           gc.collect()        # should the GC cleanup CPU buffers after moving to GPU ?
   225   2159.7 MiB      0.0 MiB           self.optimizer = optim.Adam(self.model.parameters(), lr=lr)

Edit:
version pytorch 0.4.0 on linux-64

tom · June 28, 2018, 9:23am

You could try Python garbage collection (import gc ; gc.collect()).

Best regards

Thomas

t.g · June 29, 2018, 11:24pm

That didn’t help. (see last but one line in code snippet of original post)

t.g · June 29, 2018, 11:26pm

I will be thankful if somebody who has seen the implementation of .to() functionality also comment on this matter

t.g · July 1, 2018, 3:59am

Update: I found what I was looking for!
Torch (in version 0.4.0) takes about 2GB CPU RAM upfront when the first cuda/GPU tensor is allocated. That was not a memory leak.

Line #    Mem usage    Increment   Line Contents
================================================
     4     29.2 MiB     29.2 MiB   @profile
     5                             def main():
     6     95.6 MiB     66.4 MiB       import torch
     7     95.6 MiB      0.0 MiB       size = (1024, 1024, 10)
     8     95.6 MiB      0.0 MiB       cpu, gpu = torch.device('cpu'), torch.device('cuda:0')
     9                                 # couple of CPU tensors
    10    135.6 MiB     40.0 MiB       t_cpu1 = torch.zeros(size, device=cpu)
    11    175.6 MiB     40.0 MiB       t_cpu2 = torch.zeros(size, device=cpu)
    12                                 # couple of GPU tensors
    13   2120.5 MiB   1944.9 MiB       t_gpu1 = torch.zeros(size, device=gpu)
    14   2120.5 MiB      0.0 MiB       t_gpu2 = torch.zeros(size, device=gpu)
    15   2120.6 MiB      0.1 MiB       t_gpu3 = t_cpu1.to(gpu)
    16   2080.6 MiB    -40.0 MiB       del t_cpu1
    17   2080.6 MiB      0.0 MiB       gc.collect()  # does this free memory of t_cpu1 ?
    18   2040.6 MiB    -40.0 MiB       del t_cpu2
    19   2040.6 MiB      0.0 MiB       gc.collect()  # does this free memory of t_cpu2?
    20   2040.6 MiB      0.0 MiB       del t_gpu1, t_gpu2, t_gpu3
    21   2040.6 MiB      0.0 MiB       gc.collect()  # these doesnt take any CPU RAM !!
    22   2040.6 MiB      0.0 MiB       del torch    # remove torch module from context
    23   2040.6 MiB      0.0 MiB       gc.collect()  # does it free everything ?

P.S.
the memory leak was somewhere else on my code, I was misled by the initial 2GB unaccounted memory.

Thanks, to anyone who attempted to answer my question!

ChanggongZhang · December 11, 2018, 9:32am

Hello t.g,
I am facing the same problem now, i.e., “Torch (in version 0.4.0) takes about 2GB CPU RAM upfront when the first cuda/GPU tensor is allocated”.
Therefore, I am kindly asking you how you manage to solve this problem? b.t.w, what do you mean by “misled by the initial 2GB unaccounted memory”?

Looking forward to your reply.
Best,
Changgong

t.g · December 11, 2018, 10:05pm

I wasnt able to fix it. I understood that its the way pytorch works and moved on.

Ashbajawed · August 26, 2021, 1:34pm

hey how did you get memory usage at each line

mikhail.trofimov · July 1, 2022, 3:45pm

this can be done with memory_profiler

mikhail.trofimov · July 1, 2022, 4:03pm

Any ideas why it works this way?

I also faced with this behaviour

import torch
from memory_profiler import profile
import gc

import resource
def mem_usage():
    usage=resource.getrusage(resource.RUSAGE_SELF)
    return f'mem usage={usage[2]/1024.0} mb'

@profile
def load_func(path):
    m = torch.load(path, map_location="cuda:0")
    return m

if __name__ == "__main__":
    big_model_path ='pytorch_model.bin'
    m = load_func(big_model_path)
    gc.collect()
    print(f'cuda memory: {torch.cuda.memory_allocated()  /1024/1024}')
    print(mem_usage())

Output:

$python3 memory_test.py
Filename: memory_test.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    10    212.5 MiB    212.5 MiB           1   @profile
    11                                         def load_func(path):
    12   2767.9 MiB   2555.3 MiB           1       m = torch.load(path, map_location="cuda:0")
    13   2767.9 MiB      0.0 MiB           1       return m


cuda memory: 516.2373046875
mem usage=3007.8828125 mb