I am using torch profiler to measure my model’s Flops.
model = model.cuda()
input_tensor = torch.randn((1,3, img_size, img_size)).cuda()
with torch.profiler.profile(
activities=[
torch.profiler.ProfilerActivity.CPU,
torch.profiler.ProfilerActivity.CUDA],
profile_memory=True,
record_shapes=True,
with_stack=True,
with_flops=True,
) as prof:
model(input_tensor)
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
flops = sum([k.flops for k in prof.key_averages()])
print(f"Total G FLOPs: {flops / 1e9} G")
input_tensor = input_tensor.cpu().detach()
del input_tensor
When I load test data to test before this code block, the result is always the same,
But after this code, the result is always different.
Is there any reason?