I am using a modified version of the SAC algorithm presented at: spinningup/spinup/algos/pytorch/sac at master · openai/spinningup · GitHub . The memory consumption slowly increase until the process is killed. Following recommendations from How to debug causes of GPU memory leaks?, I created this function:
def gc_n():
n = 0
for obj in gc.get_objects():
try:
if torch.is_tensor(obj) or (hasattr(obj, 'data') and torch.is_tensor(obj.data)):
n += 1
except:
pass
return n
When I run it, I am capable of printing n
to a file, however, the code seems to freeze afterward, with no possibility to kill via CRTL+C
, and the message:
/home/kubicki/.local/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py:126: UserWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
warnings.warn("torch.distributed.reduce_op is deprecated, please use "
If I use a simpler function:
def gc_simple_n():
return len(gc.get_objects())
The run proceeds normally, but since every Python object is counted, the result is much harder to interpret.
Could you help me understand the error and get the correct number of tensors ?