AttributeError: 'NoneType' object has no attribute 'dequantize'

Can you please explain me why the model below does not work and throws the error message in the title

class Simple1DCNN(torch.nn.Module):
    def __init__(self):
        super(Simple1DCNN, self).__init__()
        self.layer1 = torch.nn.Conv1d(in_channels=7, out_channels=20, kernel_size=5, stride=2)
        self.act1 = torch.nn.ReLU()
        self.layer2 = torch.nn.Conv1d(in_channels=20, out_channels=500000, kernel_size=1)
    def forward(self, x):
        x = self.layer1(x)
        x = self.act1(x)
        x = self.layer2(x)
    
# Instantiate the model
model = Simple1DCNN()

class QuantizedModel(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        
        self.quant = torch.quantization.QuantStub()
        self.model = model
        self.dequant = torch.quantization.DeQuantStub()
        
    def forward(self, x):
        x = self.quant(x)
        x = self.model(x)
        x = self.dequant(x)
        return x

while this other one does?

class QuantizedModel(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.quant = torch.quantization.QuantStub()
        self.layer1 = torch.nn.Conv1d(in_channels=7, out_channels=20, kernel_size=5, stride=2)
        self.act1 = torch.nn.ReLU()
        self.layer2 = torch.nn.Conv1d(in_channels=20, out_channels=500000, kernel_size=1)
        self.dequant = torch.quantization.DeQuantStub()
        
    def forward(self, x):
        x = self.quant(x)
        x = self.layer1(x)
        x = self.act1(x)
        x = self.layer2(x)
        x = self.dequant(x)
        return x

The rest of the code is common for the two quantized models:

import torch
import torch.nn as nn
import torch.quantization as quant

quantized_model = QuantizedModel(model)
quantized_model.eval()

backend = "qnnpack"
quantized_model.qconfig = torch.ao.quantization.get_default_qconfig(backend)
model_static_quantized = torch.quantization.prepare(quantized_model, inplace=False)
model_static_prepared = torch.quantization.convert(model_static_quantized, inplace=False)

input_quant = torch.randn(20,7,5)
model_static_prepared(input_quant)

I believe the problems is in the way model, quant and dequant layers are connected (first case), as the quantised model seems to be slightly different compared to when connecting each of the individual layers (second case).
Maybe I am missing using fusion?

In your first model, you’re missing return x in the forward function.

Good call, I forgot about that, thanks.

Hi guys
Do you know how to quantize the weights of a transformer layer (Quantization-Aware Training)?

Hi, that’s tough to answer without your model’s code. The official documentation of quantization should be enough to get you started though: Quantization — PyTorch 2.1 documentation .

1 Like

Transformers are usually normal linear operations. For CPU you can use the guide @Khumbaba pointed you to.

For GPU Quantization we designed torchao specifically for transformers. See 1 2 3 for examples of where its been used so far.

1 Like

Thanks for your guide @Khumbaba
Your suggested resources are very interesting and excellent and I decide to check them out @HDCharles

I use the Whisper Small model. This model has 12 layers of encoder. I want to coat the second layer of the model and then fine tune the model. My code is as follows:

transformer_layer = model.model.encoder.layers[2]

# Quantize the transformer layer
quantized_transformer = torch.quantization.quantize_dynamic(
    transformer_layer, {torch.nn.Linear}, dtype=torch.qint8
)

# Replace the original transformer layer with the quantized version
model.model.encoder.layers[2] = quantized_transformer

After executing this code, the size of the model decreased, but when I printed the weights using the code below, I could not see the change in the weights. Is my workflow correct?

for name, param in quantized_transformer.named_parameters():
    if "weight" in name:
        print(f"Parameter Name: {name}")
        print("Quantized Weights:")
        print(param)

Thank you very much for your guidance
‌‌Best
Hajar

And the error I encounter during fine-tuning is as follows. My model is set to CPU:

model.to('cpu')

NotImplementedError: Could not run ‘quantized::linear_dynamic’ with arguments from the ‘CUDA’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit Internal Login for possible resolutions. ‘quantized::linear_dynamic’ is only available for these backends: [CPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

Encoder model section includes two convolution layers and 12 transformer layers. As follows.I used Quantization — PyTorch 2.1 documentation the following code, but I could not quantize the layers of the model (neither convolution nor transformer):

model.eval()
model.qconfig = torch.ao.quantization.get_default_qat_qconfig('x86')
model_fused = torch.ao.quantization.fuse_modules(model.model.encoder,
    [['conv1']])
model_prepared = torch.ao.quantization.prepare_qat(model_fused.train())

WhisperForConditionalGeneration(
(model): WhisperModel(
(encoder): WhisperEncoder(
(conv1): Conv1d(80, 768, kernel_size=(3,), stride=(1,), padding=(1,))
(conv2): Conv1d(768, 768, kernel_size=(3,), stride=(2,), padding=(1,))
(embed_positions): Embedding(1500, 768)
(layers): ModuleList(
(0-11): 12 x WhisperEncoderLayer(
(self_attn): WhisperSdpaAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=False)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)

I would just print the layer and look to see if the linear layer changed to QuantizedLinearDynamic. It looks like this is mostly working, I would try to run just the quantized layer without the rest of the model to see if it works.

You get that error when your model or inputs are on cuda. It could also be that your model has code that forces something to be on cuda. If you want your model to work on Cuda use torchao (linked above)

In your most recent comment you are not following the linked documentation. You need to apply quant stubs for that method, the config you selected is for QAT. The output to that would normally be a model ready for QAT, not a quantized model. For that you’d call convert once QAT is done.

Lastly if you still can’t get it working, try making a toy model like torch.nn.Sequential(torch.nn.Linear(3,3)) and following the quantization documentation. If you still see Cuda errors and stuff for the toy model, you’ll know you’re doing something wrong rather than just having a weird model.

1 Like

Hi @HDCharles
Thank you for your guidance. I am trying to start with a toy model. I hope I can solve the problem. I will definitely share the result of my work with you.

Best
Hajar