Clarification of FLOps/MACs in model descriptions

Hi there!

I noticed that the FLOps reported in the torchvision library are different from those computed with the torch profiler. So I computed the FLOps for several architectures from the torchvision library using meta’s fvcore library and the official torch profiler:

architecture reported fvcore torch profiler
AlexNet 0.71 0.71 1.43
ResNet 50 4.09 4.11 8.18
DenseNet 121 2.83 2.87 5.67
Swin B 15.43 15.47 30.88
MaxViT T 5.56 5.61 11.13
ViT-B 16 17.56 16.87 33.70

From these results, the torch profiler calculates the actual FLOps, while the fvcore library calculates the MACs (FLOps ~= 2MACs). However, the descriptions of the torchvision models in this link show the MACs and not the FLOps. Is there any way to clarify this in the official documentation?

Also, I noticed that the MACs of the ViT-B differ from the reported value ($\sim 5 %$). Do you have any idea why?

Thank you very much!

Code snippet to calculate the FLOps with torch 2.1 and cuda 11.8:

import torch
import torchvision.models as models
from fvcore.nn import FlopCountAnalysis, flop_count_table, flop_count_str

models_to_load = ['alexnet', 'resnet50', 'densenet121', 'swin_b', 'maxvit_t', 'vit_b_16']
input = torch.randn(1, 3, 224, 224, device='cuda')

class AutoCastModel(torch.nn.Module):
    def __init__(self, model):
        self.model = model

    @torch.autocast('cuda')  # to use the traditional attention instead of the flash attention
    def forward(self, x):
        return self.model(x)

for m in models_to_load:
    print('=' * 79)
    print('=' * 79)
    model = AutoCastModel(getattr(models, m)())

    flop = FlopCountAnalysis(model, input)
    print(flop_count_table(flop, max_depth=0))
    with torch.profiler.profile(with_flops=True) as p, torch.autocast('cuda'):
        _ = model(input)
    # print(p.key_averages().table(sort_by="flops", row_limit=5))
    print('{:.2f} GFLOPS (torch profile)'.format(sum(k.flops for k in p.key_averages()) / 1e9))

I also observed the same phenomenon with my customized model. The FLOPs counted by torch.profiler.profile is approximately 2 times of that counted by mmengine.analysis.get_model_complexity_info() (mmengine doc).

I suspect this is caused by the unclear definition of FLOPs. torch.profiler.profile counts multiply-add as two operations according with nvidia’s document:

Each multiply-add comprises two operations, thus one would multiply the throughput in the table by 2 to get FLOP counts per clock.

While mmengine following fvcore to calculate multiply-add as one operation.