Hello. I need to unfold some feature map of may network during training, which is cuda memory consuming. I found that the program dumps because of “out of cuda memory” after a few training loop, however during training loop, the variable I allocate should be local in the '‘for’ statement, I don’t know why it consumes out of memory after a few success loop.I think the memory consuming should be fixed during every loop. Can anyone help me out? Thanks!
Two methods which I frequently use for debugging:
By @smth def memReport(): for obj in gc.get_objects(): if torch.is_tensor(obj): print(type(obj), obj.size()) def cpuStats(): print(sys.version) print(psutil.cpu_percent()) print(psutil.virtual_memory()) # physical memory usage pid = os.getpid() py = psutil.Process(pid) memoryUse = py.memory_info() / 2. ** 30 # memory use in GB...I think print('memory GB:', memoryUse) cpuStats() memReport()
Edited by @smth for PyTorch 0.4 and above, which doesn’t need the
Thanks! Does python gc collect garbage as soon as variable has no reference? Or with delay?
@chenchr it does immediately, unless you have reference cycles.
Thanks! Do you means that:
def func(): a = Variable(torch.randn(2,2)) a = Variable(torch.randn(100,100)) return
the memory allocated in
a = Variable(torch.randn(2,2)) will be freed as soon as the code
a = Variable(torch.randn(100,100)) is executed?
But, don’t forget that once you call
a = Variable(torch.rand(2, 2)),
a holds the data.
When you call
a = Variable(torch.rand(100, 100)) afterwards, first
Variable(torch.rand(100, 100)) is allocated (so the first tensor is still in memory), then it is assigned to
a, and then
Variable(torch.rand(2, 2)) is freed.
that means there have to be enough memory for two variable during the creation of the second variable？
That means that if you have something like
a = torch.rand(1024, 1024, 1024) # 4GB # the following line allocates 4GB extra before the assignment, # so you need to have 8GB in order for it to work a = torch.rand(1024, 1024, 1024) # now you only use 4GB