I have a PyTorch model deployed in production and the model gets retrained every day and replaces the old model. During this process, I see the memory usage increases monotonically until it saturates. I reproduced my observation with the following code. Do we have any solution to avoid memory leakage?
from torchvision.models import resnet50, ResNet50_Weights
import resource
import matplotlib.pyplot as plt
def mem_usage():
memory_usage_rss_self = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024
memory_usage_rss_children = resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss /1024
return memory_usage_rss_self + memory_usage_rss_children
memory = []
for i in range(300):
model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V1).eval().cpu()
del model
gc.collect()
memory.append(mem_usage())
plt.plot(memory)
plt.ylabel('memory')
plt.xlabel('time')
plt.savefig('memory.pdf', bbox_inches='tight', pad_inches=0)
I have run your code in a Colab notebook and noticed the same thing. However, this issue seems to correspond to storing data inside memory rather than memory leakage on the CPU. This means that you might have some variables (lists, Objects, etc.) that are continually updating, increasing memory usage over time.
Here is the code to reproduce my tests: memory_usage_overtime.py
import gc
import torch
from torchvision.models import resnet50, ResNet50_Weights
import resource
import matplotlib.pyplot as plt
def mem_usage():
memory_usage_rss_self = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024
memory_usage_rss_children = resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss /1024
return memory_usage_rss_self + memory_usage_rss_children
memory = []
for i in range(300):
gc.collect()
memory.append(mem_usage())
plt.clf()
plt.plot(memory)
plt.ylabel('memory')
plt.xlabel('time')
plt.show()
Here is the code for how to measure your GPU memory, which shows that the GPU usage isn’t increasing over time: memory_usage_gpu.py
import gc
import torch
from torchvision.models import resnet50, ResNet50_Weights
import resource
import matplotlib.pyplot as plt
import subprocess as sp
import os
from threading import Thread , Timer
import sched, time
def get_gpu_memory():
output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
ACCEPTABLE_AVAILABLE_MEMORY = 1024
COMMAND = "nvidia-smi --query-gpu=memory.used --format=csv"
try:
memory_use_info = output_to_list(sp.check_output(COMMAND.split(),stderr=sp.STDOUT))[1:]
except sp.CalledProcessError as e:
raise RuntimeError("command '{}' return with error (code {}): {}".format(e.cmd, e.returncode, e.output))
memory_use_values = [int(x.split()[0]) for i, x in enumerate(memory_use_info)]
return memory_use_values
memory = []
for _ in range(50):
torch.cuda.empty_cache()
model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V1).eval().to("cuda")
del model
memory.append(get_gpu_memory()[0])
plt.clf()
plt.plot(memory)
plt.ylabel('memory')
plt.xlabel('time')
plt.show()
Thanks @sudomaze . I ran your code for GPU and the memory does not change. I also ran your code for CPU (without model creation and only storing the memory usage) and I do not see an increase in the memory usage either. Very interesting that it has different behavior (I just made a quick change that puts the plotting commands out of for loop)
The first code should show you that the memory increases, which is an indication that it is because the memory variable is being updated.
You might try to make sure that all of the variables in the environment don’t count in the memory usage calculation because updating a list/object/etc. will increase memory usage.
Hi @ptrblck . I observe the memory increase due to the Pytorch model loading on the CPU. I created a small code that reproduces my problem. Do you have any opinion on how I can resolve this issue? Thanks in advance
Sorry, I don’t fully understand the use case as I’m only seeing a small kB increase in each iteration, which might correspond to the memory values stored in the list.