Hello. I need to unfold some feature map of may network during training, which is cuda memory consuming. I found that the program dumps because of “out of cuda memory” after a few training loop, however during training loop, the variable I allocate should be local in the '‘for’ statement, I don’t know why it consumes out of memory after a few success loop.I think the memory consuming should be fixed during every loop. Can anyone help me out? Thanks!
Two methods which I frequently use for debugging:
By @smth
def memReport():
for obj in gc.get_objects():
if torch.is_tensor(obj):
print(type(obj), obj.size())
def cpuStats():
print(sys.version)
print(psutil.cpu_percent())
print(psutil.virtual_memory()) # physical memory usage
pid = os.getpid()
py = psutil.Process(pid)
memoryUse = py.memory_info()[0] / 2. ** 30 # memory use in GB...I think
print('memory GB:', memoryUse)
cpuStats()
memReport()
Edited by @smth for PyTorch 0.4 and above, which doesn’t need the .data
check.
Thanks! Does python gc collect garbage as soon as variable has no reference? Or with delay?
@chenchr it does immediately, unless you have reference cycles.
@smth
Thanks! Do you means that:
def func():
a = Variable(torch.randn(2,2))
a = Variable(torch.randn(100,100))
return
the memory allocated in a = Variable(torch.randn(2,2))
will be freed as soon as the code a = Variable(torch.randn(100,100))
is executed?
yes. correct…
But, don’t forget that once you call a = Variable(torch.rand(2, 2))
, a
holds the data.
When you call a = Variable(torch.rand(100, 100))
afterwards, first Variable(torch.rand(100, 100))
is allocated (so the first tensor is still in memory), then it is assigned to a
, and then Variable(torch.rand(2, 2))
is freed.
@fmassa
that means there have to be enough memory for two variable during the creation of the second variable?
That means that if you have something like
a = torch.rand(1024, 1024, 1024) # 4GB
# the following line allocates 4GB extra before the assignment,
# so you need to have 8GB in order for it to work
a = torch.rand(1024, 1024, 1024)
# now you only use 4GB