Bad quantization outcome

Chame_call · April 15, 2021, 1:45pm

I have trained u2net model.
I have traced it into torchscript model to run on mobile.
It’s all good. Results are the same as on PC.

But model takes almost 200MB of storage space so I decided to quantize it using static quantization.
I have added Quant and Dequant stubs into model and used them in start of forward() method (Quant) and in end (Dequant).

Then I get the following error:

...
return torch.add(src_x, x2)
RuntimeError: Could not run ‘aten::add.Tensor’ with arguments from the ‘QuantizedCPU’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘aten::add.Tensor’ is only available for these backends: [CPU, CUDA, MkldnnCPU, SparseCPU, SparseCUDA, Meta, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

As already discussed here I added FloatFunctional object into the module class
self.ff = FloatFunctional()
and replaced my addition instruction
return torch.add(src_x, x2)
with new one
return self.ff.add(src_x, x2).

After that trace() passed with no errors and resulting model decreased in size (45MB instead of nearly 200MB).
I just replaced my old model on mobile with this new one and

Its output is very bad;
inference time increased from ~7sec to ~18sec.

Code for tracing:

import torch
from torch.utils.mobile_optimizer import optimize_for_mobile

from lib import U2NET_full

model_select = 'checkpoints/checkpoint.pth'
checkpoint = torch.load(model_select)

model = U2NET_full()
model = model.to('cpu')
if 'model' in checkpoint:
    model.load_state_dict(checkpoint['model'])
else:
    model.load_state_dict(checkpoint)

model.eval()
input = torch.rand(1, 3, 448, 448)

backend = "qnnpack"
model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)

torchscript_model = torch.jit.trace(model_static_quantized, input)
optimized_torchscript_model = optimize_for_mobile(torchscript_model)
optimized_torchscript_model.save("optimized_torchscript_model.pt")

Would you suggest any ways how to fix this?

Chame_call · April 15, 2021, 5:44pm

I followed the documentation and

replaced my torch operations with FloatFunctional alternative ones (add and cat in my case);
made fusion on fused-available modules (conv2d->bn->relu sequence in my case);

But results of my model are still wrong.

Zafar · April 15, 2021, 6:28pm

I will take a look at it later this week. Can you elaborate on what you mean by `Its output is very bad; it doesn’t deacreased in accuracy but totally wrong;"?

Chame_call · April 16, 2021, 2:34pm

the second sentence was superfluous;
‘its output is very bad’ is actual one;

Chame_call · April 19, 2021, 3:45am

Do you have any news on the subject?

Zafar · April 26, 2021, 7:22pm

Sorry, didn’t take a look yet, will do today

Zafar · April 26, 2021, 7:25pm

I figured that much, but can you give a little more info about what you mean by “output is very bad”? Is it slow? Is the accuracy low? If so, what is the difference in accuracies?

Chame_call · April 28, 2021, 3:49am

It just outputs noise

Zafar · April 28, 2021, 6:35pm

I see – the reason I am asking is that we usually use accuracy as a metric. If you are using any proxy for performance – let me know. Otherwise, I will just assume that the accuracy degrades after quantization. Also, I have my own implementation of the unet, but it would probably be helpful if I could get yours – that way, I will be able to reproduce the issue locally

Chame_call · April 30, 2021, 7:13am

Here I uploaded all the files needed to reproduce the case.

First we run python trace_model.py to make simple torchscript version of weights.
Then test results on image with python demo.py.

After that uncomment the following lines in trace_model.py:

# backend = "qnnpack"
# model.qconfig = torch.quantization.get_default_qconfig(backend)
# torch.backends.quantized.engine = backend
# model = torch.quantization.prepare(model, inplace=False)
# model = torch.quantization.convert(model, inplace=False)

and repeat steps 1,2 again;

Chame_call · May 3, 2021, 1:20pm

Have you seen the code?

brhcriteo · July 6, 2021, 9:29am

Were you ever successful in this effort?