GPU Memory Management

Flex · July 29, 2025, 10:04pm

Hi, I wrap a model in a customized class called Agent. I use a for loop to iterate over questions. WIthin the for loop, I need to create different instances of Agent class. Model is the same, but some variables within Agent class are different. I notice that as the number of iteration increases, the GPU memory increases. A small code snippet to reproduce this phenomenon is:

from transformers import AutoModelForCausalLM 
for _ in range(10):
  model = AutoModelForCausalLM("meta-llama/Meta-Llama-3-70B-Instruct", device_map="auto")
  # trial 1
  del model
  torch.cuda.empty_cache()

I tried del model + torch.cuda.empty_cache(), but GPU usage still keeps increasing after each iteration. Is there an appropriate way to avoid an increase in GPU usage?