Quantized conv1d cannot execute on arm cpu

I’m tring to running quantized NN inference on DPU.So I choose Nvidia Bluefield-2 as my hardware platform, which takes armv8 A72 as embedded CPU. Then installed latest Pytorch2.0.1 from pip. I wrote demo code and got errors like,

codes:

import torch
layer = torch.nn.quantized.Conv1d(in_channels=1, out_channels=32, kernel_size=3)
x = torch.rand(1,1,748)
qx = torch.quantize_per_tensor(input, scale=1.0, zero_point=0,dtype=torch.quint8)
layer(qx)

errors:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/ao/nn/quantized/modules/conv.py", line 369, in forward
    return ops.quantized.conv1d(input, self._packed_params, self.scale, self.zero_point)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/_ops.py", line 502, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: could not create a primitive descriptor for a reorder primitive

I don’t really understand what is primitive descriptor and reorder primitive.And what may be the reason about this error?

Hi @Xingyu_Yan, what hardware are you using to run this code?Is it your target specific architecture or just a general computer?

not sure, it should be noted the error is occuring somewhere in the dispatch system of pytorch rather than specifically in the quantization code which makes it harder to debug. Assuming this works well with a normal fp32 model, I would try testing with a quantized linear, rather than a quantized conv but otherwise, I’ve not seen much about DPUs + quantization, will check with the rest of the team but its entirely possible this type of thing simply isn’t supported at the moment and you may be better off looking for guidance somewhere with more DPU experience.

Thanks for replying!
This is my hardware NVIDIA BlueField-2 DPU Datasheet. It’s a DPU (network interface card with a powerful processor in my understanding) with embedded Armv8 core(AArch64 architecture). So I can run my code just like what I usually do on nomarl CPU.

More information about os running on it.Got this from execuating cat /proc/version,

Linux version 5.4.0-1049-bluefield (buildd@bos02-arm64-055) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1))

Thanks for replying!
I’ve tried running normal fp32 model and torch.nn.quantized.Linear on DPU and all things goes well.So I only quantize all linear layers in my model in the end.
I agree with your guess.Maybe Pytorch just doesn’t support this type of processor now.
By the way, what should I do if I really want to figure out the reason? Read source code of dispatch systems(if it’s open source) and get information about my proccesor?

The dispatch system is notoriously difficult to learn just by reading the code. The goto resource is http://blog.ezyang.com/2019/05/pytorch-internals/ but it specifically mentions not convering distributed so idk. To actually fix this I’d suspect you’d need to raise an issue in github or otherwise find someone with dpu experience. Sorry I couldn’t be more helpful.

1 Like

the other thing to try is conv2d, I know in some weird corners of the dispatch logic that quantized conv1d reshapes things and then just calls conv2d instead, so there’s a possibility that its looking for a quantized conv1d kernel and not finding it, but it’d work where a kernel exists (like quantized linear).

Thank you for your helpful suggestion. I have tried using conv2d but encountered the same error as with conv1d. I will continue to learn from the resources you have provided and may raise an issue on Github in a few weeks’ time.

Once again, thank you for all your help.