stev
October 27, 2025, 5:45pm
1
I am working on real-time scenario where inference time is extremely important. I modify the YOLOV11 module and I get :
No. of params = 1.1M
GFLOPs = 3.1
Model size = 2.9
inference = 14.1
The original YOLOv11n :
No. of params = 2.6M
GFLOPs = 6.4
Model size = 5.2
inference = 8.3
my Question if the GFLOPs for my modified model less than the original one why my model slow?
I used this code to check the inference:
from ultralytics import YOLOmodel = YOLO(“/content/drive/MyDrive/yolov11/runs/detect/train/weights/best.pt”)img = “/content/drive/MyDrive/ultralytics/datasets/UAV/images/val/00678.jpg”
# warmup on GPU
model.predict(img, imgsz=640, device=0, half=True, verbose=False)
# timed run
r = model.predict(img, imgsz=640, device=0, half=True, verbose=False)
print(r[0].speed) # {‘preprocess’: …, ‘inference’: …, ‘postprocess’: …}
I haven’t used Ultralytics. However, reducing GFLOPS must result in decreased inference time. What is the difference in inference time? Did you try using a batch of samples instead of a single sample?
stev
October 29, 2025, 8:21am
3
Can you clarify more please?
You have a model X with GFLOPs=3.1 and a model Y with GFLOPs=6.4. Assuming same hardware setup for both models
What is the inference time in seconds that you get for both models X and Y?
Currently, as shown in your code, you are loading a single image,
What happens to the inference time if you use a batch of samples (say, 32 images) ?
Also, try using PyTorch Profiler to verify if there is a computation bottleneck.
stev
November 3, 2025, 9:40am
5
this is the output for my model:
Speed: 1.6ms preprocess, 12.4ms inference, 3.6ms postprocess per image at shape (1, 3, 384, 640) Processed 100 images
This is for the original one for the same 100 images:
Speed: 1.5ms preprocess, 7.9ms inference, 1.1ms postprocess per image at shape (1, 3, 384, 640) Processed 100 images