Bad quantization outcome

I have trained u2net model.
I have traced it into torchscript model to run on mobile.
It’s all good. Results are the same as on PC.

But model takes almost 200MB of storage space so I decided to quantize it using static quantization.
I have added Quant and Dequant stubs into model and used them in start of forward() method (Quant) and in end (Dequant).

Then I get the following error:

...
return torch.add(src_x, x2)

RuntimeError: Could not run ‘aten::add.Tensor’ with arguments from the ‘QuantizedCPU’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘aten::add.Tensor’ is only available for these backends: [CPU, CUDA, MkldnnCPU, SparseCPU, SparseCUDA, Meta, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

As already discussed here I added FloatFunctional object into the module class
self.ff = FloatFunctional()
and replaced my addition instruction
return torch.add(src_x, x2)
with new one
return self.ff.add(src_x, x2).

After that trace() passed with no errors and resulting model decreased in size (45MB instead of nearly 200MB).
I just replaced my old model on mobile with this new one and

  1. Its output is very bad;
  2. inference time increased from ~7sec to ~18sec.

Code for tracing:

import torch
from torch.utils.mobile_optimizer import optimize_for_mobile

from lib import U2NET_full

model_select = 'checkpoints/checkpoint.pth'
checkpoint = torch.load(model_select)

model = U2NET_full()
model = model.to('cpu')
if 'model' in checkpoint:
    model.load_state_dict(checkpoint['model'])
else:
    model.load_state_dict(checkpoint)

model.eval()
input = torch.rand(1, 3, 448, 448)

backend = "qnnpack"
model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)

torchscript_model = torch.jit.trace(model_static_quantized, input)
optimized_torchscript_model = optimize_for_mobile(torchscript_model)
optimized_torchscript_model.save("optimized_torchscript_model.pt")

Would you suggest any ways how to fix this?

I followed the documentation and

  1. replaced my torch operations with FloatFunctional alternative ones (add and cat in my case);
  2. made fusion on fused-available modules (conv2d->bn->relu sequence in my case);

But results of my model are still wrong.

I will take a look at it later this week. Can you elaborate on what you mean by `Its output is very bad; it doesn’t deacreased in accuracy but totally wrong;"?

the second sentence was superfluous;
‘its output is very bad’ is actual one;

Do you have any news on the subject?

Sorry, didn’t take a look yet, will do today

I figured that much, but can you give a little more info about what you mean by “output is very bad”? Is it slow? Is the accuracy low? If so, what is the difference in accuracies?

1 Like

It just outputs noise

I see – the reason I am asking is that we usually use accuracy as a metric. If you are using any proxy for performance – let me know. Otherwise, I will just assume that the accuracy degrades after quantization. Also, I have my own implementation of the unet, but it would probably be helpful if I could get yours – that way, I will be able to reproduce the issue locally

Here I uploaded all the files needed to reproduce the case.

  1. First we run python trace_model.py to make simple torchscript version of weights.
  2. Then test results on image with python demo.py.

After that uncomment the following lines in trace_model.py:

# backend = "qnnpack"
# model.qconfig = torch.quantization.get_default_qconfig(backend)
# torch.backends.quantized.engine = backend
# model = torch.quantization.prepare(model, inplace=False)
# model = torch.quantization.convert(model, inplace=False)

and repeat steps 1,2 again;

Have you seen the code?

Were you ever successful in this effort?