Quantization - RuntimeError: apply_dynamic is not implemented for this packed parameter type

Rayne91 · April 3, 2024, 9:37pm

I have been trying to apply dynamic quantization to a model.
After performing the quantization, I try to revaluate the model to check for any modification in the prediction power.
The problem is that, after moving everything to device=‘cpu’ and running outputs = model(data), I receive RuntimeError: apply_dynamic is not implemented for this packed parameter type error.
Do you know if there are any steps that I am missing with the quantization or is this a possible bug of the code?

torch 2.1.2+cu121
torchaudio 2.1.2+cu121
torchprofile 0.0.4
torchvision 0.16.2+cu121

Here the error

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "CONDA_ENV\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "CONDA_ENV\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "CONDA_ENV\lib\site-packages\torch\ao\nn\quantized\dynamic\modules\linear.py", line 54, in forward
    Y = torch.ops.quantized.linear_dynamic(
  File "CONDA_ENV\lib\site-packages\torch\_ops.py", line 692, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: apply_dynamic is not implemented for this packed parameter type

Here my original and quantized models:

model
CNNSpec(
  (audio_conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  (audio_pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=92736, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=2, bias=True)
)

model_dynamic_quantization
CNNSpec(
  (audio_conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  (audio_pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  (fc1): DynamicQuantizedLinear(in_features=92736, out_features=64, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
  (fc2): DynamicQuantizedLinear(in_features=64, out_features=2, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
)

Here the code:

import torch
import torch.nn as nn
import torch.nn.functional as F

def dynamic_quantization(model):
    model_quantized = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
    return model_quantized

model_dynamic_quantization = dynamic_quantization(model)

device = 'cpu'
model_dynamic_quantization.eval()  # Set the model to evaluation mode
model.to(device)
with torch.no_grad():  # No need to track gradients during evaluation
    for batch in test_dataloader:
                data, labels = batch
                data, labels = audio_data.to(device), labels.to(device)
                outputs = model(data)

Versions

Collecting environment information…
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA RTX 2000 Ada Generation Laptop GPU
Nvidia driver version: 538.27
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2500
DeviceID=CPU0
Family=198
L2CacheSize=11776
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2500
Name=13th Gen Intel(R) Core™ i7-13800H
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] torch==2.1.2+cu121
[pip3] torchaudio==2.1.2+cu121
[pip3] torchprofile==0.0.4
[pip3] torchvision==0.16.2+cu121
[conda] numpy 1.26.3 pypi_0 pypi
[conda] torch 2.1.2+cu121 pypi_0 pypi
[conda] torchaudio 2.1.2+cu121 pypi_0 pypi
[conda] torchprofile 0.0.4 pypi_0 pypi
[conda] torchvision 0.16.2+cu121 pypi_0 pypi

supriyar · April 12, 2024, 7:17pm

can you try moving the model to CPU before doing quantization to run all steps on the same device and see if that solves the error? It looks like the model is trying to run the cudnn apply_dynamic op.

Rayne91 · April 15, 2024, 7:26am

Thank you for your suggestion! Sadly, the model is already on the ‘cpu’, the same applies also to the data and labels before running the model(data) command

next(model.parameters()).device
device(type='cpu')

HDCharles · April 26, 2024, 6:49pm

can you try the following

try just doing torch.quantization.quantize_dynamic(model) it looks like the qconfig spec might not be specified correctly in the above code.
replace the model with torch.nn.Sequential(torch.nn.Linear(32,32)) and see if that works, test by passing in random data like model(torch.randn(32,32)). The error might be specific to some aspect of the model.
can you take a look at the following test:

github.com

pytorch/pytorch/blob/91d565da0c5c95a19ab0abbd47645507d7124340/test/quantization/eager/test_quantize_eager_ptq.py#L1088-L1125


      
          def test_single_layer(self):
              r"""Dynamic Quantize SingleLayerLinearDynamicModel which has one Linear module,
              make sure it is swapped to nnqd.Linear which is the quantized version of
              the module
              """
              for dtype in [torch.qint8, torch.float16]:
                  model = SingleLayerLinearDynamicModel().eval()
                  qconfig = float16_dynamic_qconfig if dtype == torch.float16 else default_dynamic_qconfig
                  qconfig_dict = {
                      'fc1': qconfig
                  }
                  prepare_dynamic(model, qconfig_dict)
                  convert_dynamic(model)
          
                  def checkQuantized(model):
                      self.checkDynamicQuantizedLinear(model.fc1, dtype)
                      self.checkScriptable(model, self.calib_data, check_save_load=True)
                      self.checkNoQconfig(model)
          
                  checkQuantized(model)

This file has been truncated. show original

and see if that test runs successfully on your machine?

Rayne91 · May 15, 2024, 9:05am

I have debugged the code and found the solution. The issue was that I had to move the dense model to CPU before applying quantization! Now everything works, thank you!