Bilinear is slower than nearest after QAT

In my origin model,the upsample part is

F.interpolate(l7, scale_factor=2.0, mode='bilinear', align_corners=True)

,when i get QAT and tried it on android ,the inference time of the is slow, just similar to the
So,i changed the upsample part just like

F.interpolate(l7, scale_factor=2.0, mode='nearest')

the inference time is speed up.
But the result of segmentation model is too bad.
Why bilinear is slower than nearest after QAT?
Is there anyone can explain and give some suggestions.


Actually, I do not know about the QAT, but nearest is always faster than bilinear. In bilinear a transformation need to be computed meanwhile nearest is just copy/paste without any computation (almost).

Although in large tensors, linear is possibly preferred even in term of speed.


Quantization-aware training (QAT) is the quantization method that typically results in the highest accuracy.
You are right, nearest is always faster than bilinear.
I test it on android, the difference less than 5ms ,but it’s more than about 100ms for quanted model.

1 Like