Bilinear is slower than nearest after QAT

In my origin model,the upsample part is

F.interpolate(l7, scale_factor=2.0, mode='bilinear', align_corners=True)

,when i get QAT model.pt and tried it on android ,the inference time of the model.pt is slow, just similar to the float.pt
So,i changed the upsample part just like

F.interpolate(l7, scale_factor=2.0, mode='nearest')

the inference time is speed up.
But the result of segmentation model is too bad.
Why bilinear is slower than nearest after QAT?
Is there anyone can explain and give some suggestions.
Thx

Hi,

Actually, I do not know about the QAT, but nearest is always faster than bilinear. In bilinear a transformation need to be computed meanwhile nearest is just copy/paste without any computation (almost).

Although in large tensors, linear is possibly preferred even in term of speed.

Bests

Quantization-aware training (QAT) is the quantization method that typically results in the highest accuracy.
You are right, nearest is always faster than bilinear.
I test it on android, the difference less than 5ms ,but it’s more than about 100ms for quanted model.

1 Like