Hi, I wrap a model in a customized class called Agent. I use a for loop to iterate over questions. WIthin the for loop, I need to create different instances of Agent class. Model is the same, but some variables within Agent class are different. I notice that as the number of iteration increases, the GPU memory increases. A small code snippet to reproduce this phenomenon is:
from transformers import AutoModelForCausalLM
for _ in range(10):
model = AutoModelForCausalLM("meta-llama/Meta-Llama-3-70B-Instruct", device_map="auto")
# trial 1
del model
torch.cuda.empty_cache()
I tried del model
+ torch.cuda.empty_cache()
, but GPU usage still keeps increasing after each iteration. Is there an appropriate way to avoid an increase in GPU usage?