Relation between the GFLOPs and inferance

I am working on real-time scenario where inference time is extremely important. I modify the YOLOV11 module and I get :

No. of params = 1.1M

GFLOPs = 3.1

Model size = 2.9

inference = 14.1

The original YOLOv11n :

No. of params = 2.6M

GFLOPs = 6.4

Model size = 5.2

inference = 8.3

my Question if the GFLOPs for my modified model less than the original one why my model slow?

I used this code to check the inference:

from ultralytics import YOLOmodel = YOLO(“/content/drive/MyDrive/yolov11/runs/detect/train/weights/best.pt”)img = “/content/drive/MyDrive/ultralytics/datasets/UAV/images/val/00678.jpg”

# warmup on GPU

model.predict(img, imgsz=640, device=0, half=True, verbose=False)

# timed run

r = model.predict(img, imgsz=640, device=0, half=True, verbose=False)

print(r[0].speed) # {‘preprocess’: …, ‘inference’: …, ‘postprocess’: …}

I haven’t used Ultralytics. However, reducing GFLOPS must result in decreased inference time. What is the difference in inference time? Did you try using a batch of samples instead of a single sample?

Can you clarify more please?

You have a model X with GFLOPs=3.1 and a model Y with GFLOPs=6.4. Assuming same hardware setup for both models

  1. What is the inference time in seconds that you get for both models X and Y?

Currently, as shown in your code, you are loading a single image,

  1. What happens to the inference time if you use a batch of samples (say, 32 images) ?

Also, try using PyTorch Profiler to verify if there is a computation bottleneck.

this is the output for my model:

Speed: 1.6ms preprocess, 12.4ms inference, 3.6ms postprocess per image at shape (1, 3, 384, 640) Processed 100 images

This is for the original one for the same 100 images:

Speed: 1.5ms preprocess, 7.9ms inference, 1.1ms postprocess per image at shape (1, 3, 384, 640) Processed 100 images