Quantized::linear (xnnpack): xnn create operator failed(2)

GangnamStyle · February 12, 2025, 3:02pm

Hello,
Passing certain quantized inputs into a quantized linear layer gives the error: quantized::linear (xnnpack): xnn create operator failed(2). I am on torch version 2.3.1, and using the qnnpack backend. It seems to occur when the input scale is less than or equal to 1e-38. The following is a minimal example.

import torch as th
th.backends.quantized.engine = "qnnpack"

# Create input
x = th.tensor((0.1)).reshape(1, 1)
xq = th.quantize_per_tensor(x, scale=1e-38, zero_point=0, dtype=th.qint8)

# Create FC layer
fc = th.nn.quantized.Linear(1, 1)

# Fails here
fc(xq)

I think it’s a case of the scale being so small that it can’t even be represented in float. However, I ran into this issue during QAT training (in fact the scale was 1e-40), and am not sure how I can force the model to only use representable values. Is it correct that the error is being caused by the input scale being too small? Is there a workaround to prevent the scale from becoming too small during training? Thank you very much.

GangnamStyle · February 12, 2025, 7:19pm

I was able to fix the issue. The problem was being caused by a softmax layer producing very small non-zero outputs. For example, if given two values it could potentially output 1e-40 and 0.9999, and it seems like the 1e-40 was throwing off quantization. To fix this I’ve clipped the minimum value to 0.00001.