Deepspeed flops profiler for llama-3-8B model on compile mode

Arunima_Ghosh · August 6, 2024, 9:46am

Hi all,
I want to find out the total number of flops of an inference flow of llama-3-8B model in compile mode using deepspeed flops profiler. I am using the following code for this purpose:

model.generate = torch.compile(model.generate,backend=“aot_eager”)
prof = FlopsProfiler(model)
prof.start_profile()
input_ids= tokenizer(batch_sentences,truncation=True,padding=“max_length”,max_length=256, return_tensors=“pt”).to(device=“cpu”)
input_ids = input_ids[‘input_ids’]
start_time1 = time.time()
with torch.no_grad():
outputs = model.generate(input_ids,generation_config=generation_config)
end_time1 = time.time() #time taken to generate inference

prof.stop_profile()

tot_flops = prof.get_total_flops(as_string=False)
print(tot_flops)

If I print this tot_flops variable to get the total number of flops, I am getting a value similar like 123456i87j instead of a numeric number. Can you please help me to understand what is wrong in my code? How to get the total number of flops?