Bad quantization outcome

I have trained u2net model.
I have traced it into torchscript model to run on mobile.
It’s all good. Results are the same as on PC.

But model takes almost 200MB of storage space so I decided to quantize it using static quantization.
I have added Quant and Dequant stubs into model and used them in start of forward() method (Quant) and in end (Dequant).

Then I get the following error:

...
return torch.add(src_x, x2)

RuntimeError: Could not run ‘aten::add.Tensor’ with arguments from the ‘QuantizedCPU’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘aten::add.Tensor’ is only available for these backends: [CPU, CUDA, MkldnnCPU, SparseCPU, SparseCUDA, Meta, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

As already discussed here I added FloatFunctional object into the module class
self.ff = FloatFunctional()
and replaced my addition instruction
return torch.add(src_x, x2)
with new one
return self.ff.add(src_x, x2).

After that trace() passed with no errors and resulting model decreased in size (45MB instead of nearly 200MB).
I just replaced my old model on mobile with this new one and

  1. Its output is very bad;
  2. inference time increased from ~7sec to ~18sec.

Code for tracing:

import torch
from torch.utils.mobile_optimizer import optimize_for_mobile

from lib import U2NET_full

model_select = 'checkpoints/checkpoint.pth'
checkpoint = torch.load(model_select)

model = U2NET_full()
model = model.to('cpu')
if 'model' in checkpoint:
    model.load_state_dict(checkpoint['model'])
else:
    model.load_state_dict(checkpoint)

model.eval()
input = torch.rand(1, 3, 448, 448)

backend = "qnnpack"
model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)

torchscript_model = torch.jit.trace(model_static_quantized, input)
optimized_torchscript_model = optimize_for_mobile(torchscript_model)
optimized_torchscript_model.save("optimized_torchscript_model.pt")

Would you suggest any ways how to fix this?

I followed the documentation and

  1. replaced my torch operations with FloatFunctional alternative ones (add and cat in my case);
  2. made fusion on fused-available modules (conv2d->bn->relu sequence in my case);

But results of my model are still wrong.

I will take a look at it later this week. Can you elaborate on what you mean by `Its output is very bad; it doesn’t deacreased in accuracy but totally wrong;"?

the second sentence was superfluous;
‘its output is very bad’ is actual one;

Do you have any news on the subject?

Sorry, didn’t take a look yet, will do today

I figured that much, but can you give a little more info about what you mean by “output is very bad”? Is it slow? Is the accuracy low? If so, what is the difference in accuracies?

It just outputs noise

I see – the reason I am asking is that we usually use accuracy as a metric. If you are using any proxy for performance – let me know. Otherwise, I will just assume that the accuracy degrades after quantization. Also, I have my own implementation of the unet, but it would probably be helpful if I could get yours – that way, I will be able to reproduce the issue locally

Here I uploaded all the files needed to reproduce the case.

  1. First we run python trace_model.py to make simple torchscript version of weights.
  2. Then test results on image with python demo.py.

After that uncomment the following lines in trace_model.py:

# backend = "qnnpack"
# model.qconfig = torch.quantization.get_default_qconfig(backend)
# torch.backends.quantized.engine = backend
# model = torch.quantization.prepare(model, inplace=False)
# model = torch.quantization.convert(model, inplace=False)

and repeat steps 1,2 again;

Have you seen the code?

Were you ever successful in this effort?