How to compute the model complexity of FasterRCNNFPN pretrained from torchvision?

pvtien96 · June 27, 2023, 4:33pm

I got the pretrained FASTERRCNN_RESNET50_FPN model from pytorch (torchvision).
Now I want to compute the model’s complexity (number of parameters and FLOPs) as reported from torchvsion:

How to do this? Normally with classification model (e.g. resnet50), we can use tools such as thop or ptflop. But the main concern is: What is the correct input image size (width & height, channel=3 for sure)? From my reading, FasterCNN accepts unfixed input image size, but I’ve not found the step where the image is resized during forward. Personally, I think the image will be passed to the backbone firstly (which is resnet50), so I chose input image size = (224,224) (same as imagenet’s). But when trying this with ptflop, the output FLOPs is very unstable.

Any recommendation is appreciated! Thanks in advance!

pvtien96 · June 28, 2023, 12:56pm

The number of parameters is invariant with input size.
For the computational complexity (e.g, MACs), torchvision set default value for their models as follows:

detection_models_input_dims = {
    "fasterrcnn_mobilenet_v3_large_320_fpn": (320, 320),
    "fasterrcnn_mobilenet_v3_large_fpn": (800, 800),
    "fasterrcnn_resnet50_fpn": (800, 800),
    "fasterrcnn_resnet50_fpn_v2": (800, 800),
    "fcos_resnet50_fpn": (800, 800),
    "keypointrcnn_resnet50_fpn": (1333, 1333),
    "maskrcnn_resnet50_fpn": (800, 800),
    "maskrcnn_resnet50_fpn_v2": (800, 800),
    "retinanet_resnet50_fpn": (800, 800),
    "retinanet_resnet50_fpn_v2": (800, 800),
    "ssd300_vgg16": (300, 300),
    "ssdlite320_mobilenet_v3_large": (320, 320),
}

This can be verified here: vision/test/test_extended_models.py at 25c8a3a2cc2699e4e261b9e0777a6dc5badb5f9f · pytorch/vision · GitHub

More discussion https://github.com/pytorch/vision/pull/6936

Running ptflop with an input size of (800, 800) yields identical numbers as reported by torchvision.