RuntimeError: Could not run 'aten::thnn_conv2d_forward' with arguments from the 'QuantizedCPUTensorId' backend. 'aten::thnn_conv2d_forward' is only available for these backends: [CPUTensorId, VariableTensorId]

I am trying to do post quantization.
added QuantStub() and DeQuantStub() in the model

   super(ConvTasNet, self).__init__()
    # Hyper-parameter
    self.N, self.L, self.B, self.Sk, self.H, self.P, self.X, self.R, self.C = N, L, B, Sk, H, P, X, R, C
    self.norm_type = norm_type
    self.causal = causal
    self.mask_nonlinear = mask_nonlinear
    self.skip_conn = []
    #quantization
    self.quant = QuantStub()
    self.dequant = DeQuantStub()
    # Components
    self.encoder = Encoder(L, N)
    self.separator = TemporalConvNet(N, B, Sk, H, P, X, R, C, norm_type, causal, mask_nonlinear)
    self.decoder = Decoder(N, L)
    # init
    for p in self.parameters():
        if p.dim() > 1:
            nn.init.xavier_normal_(p)

def forward(self, mixture):
    """
    Args:
        mixture: [M, T], M is batch size, T is #samples
    Returns:
        est_source: [M, C, T]
    """
    mixture = self.quant(mixture)
    mixture_w = self.encoder(mixture)
    est_mask = self.separator(mixture_w)
    est_source = self.decoder(mixture_w, est_mask)
    est_source = self.dequant(est_source)

    return est_source

Now, I want to evaluate model without and with Quantize converted model

 if __name__ == '__main__':
args = parser.parse_args()
print(args)
#evaluate(args)


num_calibration_batches = 10
model = ConvTasNet.load_model('E:\\Project\\MVNS_EC\\quantization\\Simple-Neural-Nets-with-PyTorch-master\\Eager_Post_quant\\MVNS_model\\Quantization\\model\\final.pth.tar')
model.eval()
model.qconfig = torch.quantization.default_qconfig
torch.quantization.prepare(model, inplace=True)
evaluate(model,args)
torch.quantization.convert(model, inplace=True)
evaluate(model,args)

Evaluation without quantization is done, and also the model is converted to Quantized model.
I am recieving error in the last line, when I am trying to evaluate with the quantized model.
Please guide me, what is that I am doing wrong?

Could you add model.to(‘cpu’) before model.eval()?

I added model.to(‘cpu’) before model.eval(), still facing the same issue.
is it because I didn’t fuse the model? or is it the placement of QuantStub()?

I am using conv1d, prelu and layernorm layers.
Does pytorch support quantization to these layers?
or is it possible only for the following layers?

(Actual layer : quantized layer)
nn.Linear: nnq.Linear,
nn.ReLU: nnq.ReLU,
nn.ReLU6: nnq.ReLU6,
nn.Conv2d: nnq.Conv2d,
nn.Conv3d: nnq.Conv3d,
nn.BatchNorm2d: nnq.BatchNorm2d,
nn.BatchNorm3d: nnq.BatchNorm3d,
QuantStub: nnq.Quantize,
DeQuantStub: nnq.DeQuantize

# Wrapper Modules:
nnq.FloatFunctional: nnq.QFunctional

# Intrinsic modules:
nni.ConvReLU2d: nniq.ConvReLU2d,
nni.ConvReLU3d: nniq.ConvReLU3d,
nni.LinearReLU: nniq.LinearReLU,
nniqat.ConvReLU2d: nniq.ConvReLU2d,
nniqat.LinearReLU: nniq.LinearReLU,
nniqat.ConvBn2d: nnq.Conv2d,
nniqat.ConvBnReLU2d: nniq.ConvReLU2d,

# QAT modules:
nnqat.Linear: nnq.Linear,
nnqat.Conv2d: nnq.Conv2d,

I build a pytorch model based on conv1d. I gone through quantization and implemented some cases as well but all those are working on conv2d, bn,relu but In my case, my model is built on conv1d and PReLU. Does this quatization valid for these network layers? Because when I did quantization only the layers which are included in mapping is only quantized. Let me show you those layers for which quantization is valid(i.e which are included in mapping)
Please find the list of modules it supports:(according to source code in went through)
(Actual layer : quantized layer)
nn.Linear: nnq.Linear,
nn.ReLU: nnq.ReLU,
nn.ReLU6: nnq.ReLU6,
nn.Conv2d: nnq.Conv2d,
nn.Conv3d: nnq.Conv3d,
nn.BatchNorm2d: nnq.BatchNorm2d,
nn.BatchNorm3d: nnq.BatchNorm3d,
QuantStub: nnq.Quantize,
DeQuantStub: nnq.DeQuantize,

Wrapper Modules:

nnq.FloatFunctional: nnq.QFunctional,

Intrinsic modules:

nni.ConvReLU2d: nniq.ConvReLU2d,
nni.ConvReLU3d: nniq.ConvReLU3d,
nni.LinearReLU: nniq.LinearReLU,
nniqat.ConvReLU2d: nniq.ConvReLU2d,
nniqat.LinearReLU: nniq.LinearReLU,
nniqat.ConvBn2d: nnq.Conv2d,
nniqat.ConvBnReLU2d: nniq.ConvReLU2d,

QAT modules:

nnqat.Linear: nnq.Linear,
nnqat.Conv2d: nnq.Conv2d,

Is it, it means that quantization can’t be done on conv1d and PReLU?

Conv1d support is being added, https://github.com/pytorch/pytorch/pull/38438. We dont support fusion with pReLU and LayerNorm currently, so those ops will have to execute separately.

Okay @supriyar

Apart from fusing with PReLU and layerNorm, does these two are able to get quantized? Since these two layers are not mapped in default mapping config as you can see it in last comment I kept.

Thankyou @supriyar

We currently do not have support to quantize pReLU, but it should be very similar to leakyReLU which we do support.
Quantization of LayerNorm is supported as seen in https://github.com/pytorch/pytorch/blob/master/torch/nn/quantized/modules/normalization.py#L9

We will add it to our documentation. Thanks for brining it up!

Hi @supriyar

Thanks for that. I’m able to see quantized layer mappings for LayerNorm, ReLU & ReLU6 but LeakyReLU is not included here. Is LeakyReLU going to be added?

Thanks @supriyar,
Aravind

You should be able to use the functional one torch.nn.functional.leaky_relu
cc @Zafar

@Sasank_Kottapalli Hi, did you solve the problems ? I’m facing exactly the same one.

1 Like

I am facing similar problem. I did model.to(torch.device("cpu")) before model.eval().

I think this error typically means that QuantStub/DeQuantStub is not placed correctly, probably a missing DeQuantStub before conv2d. or conv2d is not quantized.

1 Like

Hi jerryzh, hope you are fine.
I am facing almost the same issue as yours, so kindly help me with this
I have trained the model using fastai, and timm libararies.
Currently, I am doing following:

effb3_model=learner_effb3.model.eval()

backend = "qnnpack"

effb3_model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(effb3_model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)
print_size_of_model(model_static_quantized)

But I am facing following error, while calling the model for inference:

RuntimeError: Could not run 'aten::thnn_conv2d_forward' with arguments from the 'QuantizedCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::thnn_conv2d_forward' is only available for these backends: [CPU, CUDA, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

This is my model:

Sequential(
  (0): Sequential(
    (0): Conv2dSame(3, 40, kernel_size=(3, 3), stride=(2, 2), bias=False)
    (1): QuantizedBatchNorm2d(40, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    (2): SiLU(inplace=True)
    (3): Sequential(
      (0): Sequential(
        (0): DepthwiseSeparableConv(
          (conv_dw): QuantizedConv2d(40, 40, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=40)
          (bn1): QuantizedBatchNorm2d(40, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(40, 10, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(10, 40, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pw): QuantizedConv2d(40, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn2): QuantizedBatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): Identity()
        )
        (1): DepthwiseSeparableConv(
          (conv_dw): QuantizedConv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=24)
          (bn1): QuantizedBatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(24, 6, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(6, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pw): QuantizedConv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn2): QuantizedBatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): Identity()
        )
      )
      (1): Sequential(
        (0): InvertedResidual(
          (conv_pw): QuantizedConv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): Conv2dSame(144, 144, kernel_size=(3, 3), stride=(2, 2), groups=144, bias=False)
          (bn2): QuantizedBatchNorm2d(144, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(144, 6, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(6, 144, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(144, 32, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (1): InvertedResidual(
          (conv_pw): QuantizedConv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=192)
          (bn2): QuantizedBatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(192, 8, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(8, 192, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (2): InvertedResidual(
          (conv_pw): QuantizedConv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=192)
          (bn2): QuantizedBatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(192, 8, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(8, 192, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (2): Sequential(
        (0): InvertedResidual(
          (conv_pw): QuantizedConv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): Conv2dSame(192, 192, kernel_size=(5, 5), stride=(2, 2), groups=192, bias=False)
          (bn2): QuantizedBatchNorm2d(192, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(192, 8, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(8, 192, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(192, 48, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (1): InvertedResidual(
          (conv_pw): QuantizedConv2d(48, 288, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(288, 288, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=288)
          (bn2): QuantizedBatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(288, 12, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(12, 288, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(288, 48, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (2): InvertedResidual(
          (conv_pw): QuantizedConv2d(48, 288, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(288, 288, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=288)
          (bn2): QuantizedBatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(288, 12, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(12, 288, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(288, 48, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(48, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (3): Sequential(
        (0): InvertedResidual(
          (conv_pw): QuantizedConv2d(48, 288, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): Conv2dSame(288, 288, kernel_size=(3, 3), stride=(2, 2), groups=288, bias=False)
          (bn2): QuantizedBatchNorm2d(288, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(288, 12, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(12, 288, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(288, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (1): InvertedResidual(
          (conv_pw): QuantizedConv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=576)
          (bn2): QuantizedBatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(576, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(24, 576, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (2): InvertedResidual(
          (conv_pw): QuantizedConv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=576)
          (bn2): QuantizedBatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(576, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(24, 576, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (3): InvertedResidual(
          (conv_pw): QuantizedConv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=576)
          (bn2): QuantizedBatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(576, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(24, 576, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (4): InvertedResidual(
          (conv_pw): QuantizedConv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=576)
          (bn2): QuantizedBatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(576, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(24, 576, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(96, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (4): Sequential(
        (0): InvertedResidual(
          (conv_pw): QuantizedConv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(576, 576, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=576)
          (bn2): QuantizedBatchNorm2d(576, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(576, 24, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(24, 576, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(576, 136, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(136, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (1): InvertedResidual(
          (conv_pw): QuantizedConv2d(136, 816, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(816, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(816, 816, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=816)
          (bn2): QuantizedBatchNorm2d(816, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(816, 34, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(34, 816, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(816, 136, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(136, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (2): InvertedResidual(
          (conv_pw): QuantizedConv2d(136, 816, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(816, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(816, 816, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=816)
          (bn2): QuantizedBatchNorm2d(816, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(816, 34, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(34, 816, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(816, 136, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(136, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (3): InvertedResidual(
          (conv_pw): QuantizedConv2d(136, 816, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(816, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(816, 816, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=816)
          (bn2): QuantizedBatchNorm2d(816, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(816, 34, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(34, 816, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(816, 136, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(136, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (4): InvertedResidual(
          (conv_pw): QuantizedConv2d(136, 816, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(816, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(816, 816, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=816)
          (bn2): QuantizedBatchNorm2d(816, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(816, 34, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(34, 816, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(816, 136, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(136, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (5): Sequential(
        (0): InvertedResidual(
          (conv_pw): QuantizedConv2d(136, 816, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(816, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): Conv2dSame(816, 816, kernel_size=(5, 5), stride=(2, 2), groups=816, bias=False)
          (bn2): QuantizedBatchNorm2d(816, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(816, 34, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(34, 816, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(816, 232, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(232, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (1): InvertedResidual(
          (conv_pw): QuantizedConv2d(232, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(1392, 1392, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=1392)
          (bn2): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(1392, 58, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(58, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(1392, 232, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(232, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (2): InvertedResidual(
          (conv_pw): QuantizedConv2d(232, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(1392, 1392, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=1392)
          (bn2): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(1392, 58, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(58, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(1392, 232, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(232, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (3): InvertedResidual(
          (conv_pw): QuantizedConv2d(232, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(1392, 1392, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=1392)
          (bn2): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(1392, 58, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(58, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(1392, 232, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(232, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (4): InvertedResidual(
          (conv_pw): QuantizedConv2d(232, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(1392, 1392, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=1392)
          (bn2): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(1392, 58, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(58, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(1392, 232, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(232, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (5): InvertedResidual(
          (conv_pw): QuantizedConv2d(232, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(1392, 1392, kernel_size=(5, 5), stride=(1, 1), scale=1.0, zero_point=0, padding=(2, 2), groups=1392)
          (bn2): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(1392, 58, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(58, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(1392, 232, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(232, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (6): Sequential(
        (0): InvertedResidual(
          (conv_pw): QuantizedConv2d(232, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(1392, 1392, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=1392)
          (bn2): QuantizedBatchNorm2d(1392, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(1392, 58, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(58, 1392, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(1392, 384, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
        (1): InvertedResidual(
          (conv_pw): QuantizedConv2d(384, 2304, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn1): QuantizedBatchNorm2d(2304, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): SiLU(inplace=True)
          (conv_dw): QuantizedConv2d(2304, 2304, kernel_size=(3, 3), stride=(1, 1), scale=1.0, zero_point=0, padding=(1, 1), groups=2304)
          (bn2): QuantizedBatchNorm2d(2304, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act2): SiLU(inplace=True)
          (se): SqueezeExcite(
            (conv_reduce): QuantizedConv2d(2304, 96, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
            (act1): SiLU(inplace=True)
            (conv_expand): QuantizedConv2d(96, 2304, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          )
          (conv_pwl): QuantizedConv2d(2304, 384, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
          (bn3): QuantizedBatchNorm2d(384, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
    )
    (4): QuantizedConv2d(384, 1536, kernel_size=(1, 1), stride=(1, 1), scale=1.0, zero_point=0)
    (5): QuantizedBatchNorm2d(1536, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
    (6): SiLU(inplace=True)
  )
  (1): Sequential(
    (0): AdaptiveConcatPool2d(
      (ap): AdaptiveAvgPool2d(output_size=1)
      (mp): AdaptiveMaxPool2d(output_size=1)
    )
    (1): Flatten(full=False)
    (2): BatchNorm1d(3072, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.25, inplace=False)
    (4): QuantizedLinear(in_features=3072, out_features=512, scale=1.0, zero_point=0, qscheme=torch.per_tensor_affine)
    (5): ReLU(inplace=True)
    (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.5, inplace=False)
    (8): QuantizedLinear(in_features=512, out_features=73, scale=1.0, zero_point=0, qscheme=torch.per_tensor_affine)
  )
)

Thanks for any help.

to use eager mode quantization you will need to explicitly place QuantStub/DeQuantStub in the points where quantization needs to happen, could you follow (beta) Static Quantization with Eager Mode in PyTorch — PyTorch Tutorials 1.9.0+cu102 documentation to quantize your model?

Alternatively, you can also try out our fx graph mode quantization: (prototype) FX Graph Mode Post Training Static Quantization — PyTorch Tutorials 1.9.0+cu102 documentation

Hi @jerryzh168 , thanks for your help.
Actually my model is efficientnet_b3_ns, and I think there are some layers in it(eg Silu), which cannot be quantized.
So what to do in this case…

Thanks alot

you can explicitly set the module/functional to use None to skip quantizing them in fx graph mode quantization:
for example:
qocnfig_dict = {“silu”: None}, full docs for the qconfig_dict api can be found here: pytorch/quantize_fx.py at master · pytorch/pytorch · GitHub

Thanks alot @jerryzh168 , but sorry I am quite dumb to figure it out myself :slight_smile: ,
Can you suggest me where to pass it in the following code:

effb3_model=learner_effb3.model.eval()

backend = "qnnpack"

effb3_model.qconfig = torch.quantization.get_default_qconfig(backend)
torch.backends.quantized.engine = backend
model_static_quantized = torch.quantization.prepare(effb3_model, inplace=False)
model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False)
print_size_of_model(model_static_quantized)

Thanks alot…

this is eager mode api, not fx graph mode. for an introduction about apis we have, please take a look at Quantization — PyTorch 1.9.0 documentation

@jerryzh168
Hi
Is there a solution to fix this issue?

RuntimeError: Could not run 'aten::thnn_conv2d_forward' with arguments from the 'QuantizedCPU' backend.

I am facing the same issue.

efficientNet_unet(
  (pretrained): Module(
    (layer1): Sequential(
      (0): Conv2dSameExport(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
      (1): QuantizedBatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (2): QuantizedReLU6(inplace=True)
      (3): Sequential(
        (0): DepthwiseSeparableConv(
          (conv_dw): QuantizedConv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), scale=0.5102088451385498, zero_point=58, padding=(1, 1), groups=32, bias=False)
          (bn1): QuantizedBatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
          (act1): QuantizedReLU6(inplace=True)
.......
)
  (quant): Quantize(scale=tensor([0.0375]), zero_point=tensor([57]), dtype=torch.quint8)
  (dequant): DeQuantize()

The log shows the issue happens at layer1.
Could this be related to Conv2dSameExport?