The packing format of quantized parameters after jitting

masahi · January 20, 2020, 9:47am

Hi, following the static quantization tutorial,, I am trying to extract parameters of quantized, and jitted model. It seems after jitting, parameters are packed in a way that I don’t understand. For example, if I run the snippet below after the tutorial script, I get the output below.

input_size = (1, 3, 224, 224)
inp = np.random.randn(*input_size).astype("float32")
trace = torch.jit.trace(per_channel_quantized_model, torch.from_numpy(inp))
state_dict = trace.state_dict()
for (k, v) in state_dict.items():
    print(k, v.size())

features.0.0._packed_params torch.Size([128])
features.1.conv.0.0._packed_params torch.Size([128])
features.1.conv.1._packed_params torch.Size([128])
features.2.conv.0.0._packed_params torch.Size([128])
features.2.conv.1.0._packed_params torch.Size([128])
features.2.conv.2._packed_params torch.Size([128])
features.3.conv.0.0._packed_params torch.Size([128])
features.3.conv.1.0._packed_params torch.Size([128])
features.3.conv.2._packed_params torch.Size([128])
features.4.conv.0.0._packed_params torch.Size([128])
features.4.conv.1.0._packed_params torch.Size([128])
features.4.conv.2._packed_params torch.Size([128])
features.5.conv.0.0._packed_params torch.Size([128])
features.5.conv.1.0._packed_params torch.Size([128])
features.5.conv.2._packed_params torch.Size([128])
features.6.conv.0.0._packed_params torch.Size([128])
features.6.conv.1.0._packed_params torch.Size([128])
features.6.conv.2._packed_params torch.Size([128])
features.7.conv.0.0._packed_params torch.Size([128])
features.7.conv.1.0._packed_params torch.Size([128])
features.7.conv.2._packed_params torch.Size([128])
features.8.conv.0.0._packed_params torch.Size([128])
features.8.conv.1.0._packed_params torch.Size([128])
features.8.conv.2._packed_params torch.Size([128])
features.9.conv.0.0._packed_params torch.Size([128])
features.9.conv.1.0._packed_params torch.Size([128])
features.9.conv.2._packed_params torch.Size([128])
features.10.conv.0.0._packed_params torch.Size([128])
features.10.conv.1.0._packed_params torch.Size([128])
features.10.conv.2._packed_params torch.Size([128])
features.11.conv.0.0._packed_params torch.Size([128])
features.11.conv.1.0._packed_params torch.Size([128])
features.11.conv.2._packed_params torch.Size([128])
features.12.conv.0.0._packed_params torch.Size([128])
features.12.conv.1.0._packed_params torch.Size([128])
features.12.conv.2._packed_params torch.Size([128])
features.13.conv.0.0._packed_params torch.Size([128])
features.13.conv.1.0._packed_params torch.Size([128])
features.13.conv.2._packed_params torch.Size([128])
features.14.conv.0.0._packed_params torch.Size([128])
features.14.conv.1.0._packed_params torch.Size([128])
features.14.conv.2._packed_params torch.Size([128])
features.15.conv.0.0._packed_params torch.Size([128])
features.15.conv.1.0._packed_params torch.Size([128])
features.15.conv.2._packed_params torch.Size([128])
features.16.conv.0.0._packed_params torch.Size([128])
features.16.conv.1.0._packed_params torch.Size([128])
features.16.conv.2._packed_params torch.Size([128])
features.17.conv.0.0._packed_params torch.Size([128])
features.17.conv.1.0._packed_params torch.Size([128])
features.17.conv.2._packed_params torch.Size([128])
features.18.0._packed_params torch.Size([128])
quant.scale torch.Size([1])
quant.zero_point torch.Size([1])
classifier.1._packed_params._packed_params torch.Size([104])

I have no idea what is going on in this format and I have many questions. But for now let me ask you these:

Is there a documentation of the packing format?
How can I extract the original floating point tensors along with scale and zero point? I confirmed that they are available before tracing.
Or even better, is there a way to prevent packing?
During tracing, where in the code base does this packing happen?

I’m trying to translate jitted, quantized PyTorch model to TVM IR. For that I need floating point tensors with scale and zero point. That is the reason I’m asking here.

cc @raghuramank100 @jerryzh168

masahi · January 20, 2020, 10:30am

ok torch.ops.quantized.conv2d_unpack did the job.

johnzhou1996 · March 11, 2020, 8:32am

Hello, I met the same problem. Could you show me the detail of the “torch.ops.quantized.conv2d_unpack”? And how to deal with classifier.1._packed_params?
Thanks!

masahi · March 11, 2020, 9:15am

See the implementation in TVM I added:

github.com

apache/incubator-tvm/blob/06e9542ee0bfd014bd06a4dd4fdb3af9d2d29eb0/python/tvm/relay/frontend/qnn_torch.py#L50-L100


def _unpack_quant_params(param_name, packed_params, unpack_func):
    # Torch stores quantized params in a custom packed format,
    # need to unpack and retrieve them as numpy arrays
    qweight, bias = unpack_func(packed_params)
    weight_np = qweight.dequantize().numpy()


    import torch
    if qweight.qscheme() == torch.per_tensor_affine:
        param = QNNParam(weight_np, bias, qweight.q_scale(),
                         int(qweight.q_zero_point()), param_name)
    else:
        scales = qweight.q_per_channel_scales().numpy()
        zero_points = qweight.q_per_channel_zero_points().numpy()
        # This is an assumption posed by QNN
        msg = "The values of zero points should be all zero for per channel"
        assert np.all(zero_points == 0), msg
        param = QNNParam(weight_np, bias, scales, 0, param_name)


    return param

This file has been truncated. show original

From the name classifier.1._packed_params I guess it comes from nn.Linear. In that case, you need to use torch.ops.quantized.linear_unpack.

k.osama · July 22, 2020, 7:28pm

Hi I am working with a quantized model in C++, I wonder if I can parse the jitted model parameters like this in C++ ? I could not find any unpacking modules in torch::jit::script::Module . I have trained and quantized my model in Python and loaded to C++. I am using version 1.6.0+