I’m new to quantization so I couldn’t figure out a way to easily reproduce this without going through the whole flow.
pdb during a forward pass of a quantized model:
print(x.dtype) # >> torch.quint8 print(x.shape) # >> torch.Size([1, 40, 64, 384]) print(x.mean((2,3), keepdim=True).shape) # >> torch.Size([1, 40])
This happens when I run the forward pass just after setting
torch.backends.quantized.engine = 'qnnpack'.
If I do not set it, the forward pass runs fine, and 10x faster than the non-quantized version of my model (in other words, as expected)
Running this on Android causes the same issue.