Quantization - RuntimeError: apply_dynamic is not implemented for this packed parameter type

I have been trying to apply dynamic quantization to a model.
After performing the quantization, I try to revaluate the model to check for any modification in the prediction power.
The problem is that, after moving everything to device=‘cpu’ and running outputs = model(data), I receive RuntimeError: apply_dynamic is not implemented for this packed parameter type error.
Do you know if there are any steps that I am missing with the quantization or is this a possible bug of the code?

torch 2.1.2+cu121
torchaudio 2.1.2+cu121
torchprofile 0.0.4
torchvision 0.16.2+cu121

Here the error

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "CONDA_ENV\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "CONDA_ENV\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "CONDA_ENV\lib\site-packages\torch\ao\nn\quantized\dynamic\modules\linear.py", line 54, in forward
    Y = torch.ops.quantized.linear_dynamic(
  File "CONDA_ENV\lib\site-packages\torch\_ops.py", line 692, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: apply_dynamic is not implemented for this packed parameter type

Here my original and quantized models:

model
CNNSpec(
  (audio_conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  (audio_pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=92736, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=2, bias=True)
)

model_dynamic_quantization
CNNSpec(
  (audio_conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  (audio_pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  (fc1): DynamicQuantizedLinear(in_features=92736, out_features=64, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
  (fc2): DynamicQuantizedLinear(in_features=64, out_features=2, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
) 

Here the code:

import torch
import torch.nn as nn
import torch.nn.functional as F

def dynamic_quantization(model):
    model_quantized = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
    return model_quantized

model_dynamic_quantization = dynamic_quantization(model)

device = 'cpu'
model_dynamic_quantization.eval()  # Set the model to evaluation mode
model.to(device)
with torch.no_grad():  # No need to track gradients during evaluation
    for batch in test_dataloader:
                data, labels = batch
                data, labels = audio_data.to(device), labels.to(device)
                outputs = model(data)

Versions

Collecting environment information…
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA RTX 2000 Ada Generation Laptop GPU
Nvidia driver version: 538.27
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2500
DeviceID=CPU0
Family=198
L2CacheSize=11776
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2500
Name=13th Gen Intel(R) Core™ i7-13800H
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] torch==2.1.2+cu121
[pip3] torchaudio==2.1.2+cu121
[pip3] torchprofile==0.0.4
[pip3] torchvision==0.16.2+cu121
[conda] numpy 1.26.3 pypi_0 pypi
[conda] torch 2.1.2+cu121 pypi_0 pypi
[conda] torchaudio 2.1.2+cu121 pypi_0 pypi
[conda] torchprofile 0.0.4 pypi_0 pypi
[conda] torchvision 0.16.2+cu121 pypi_0 pypi

can you try moving the model to CPU before doing quantization to run all steps on the same device and see if that solves the error? It looks like the model is trying to run the cudnn apply_dynamic op.

Thank you for your suggestion! Sadly, the model is already on the ‘cpu’, the same applies also to the data and labels before running the model(data) command

next(model.parameters()).device
device(type='cpu')

can you try the following

  1. try just doing torch.quantization.quantize_dynamic(model) it looks like the qconfig spec might not be specified correctly in the above code.

  2. replace the model with torch.nn.Sequential(torch.nn.Linear(32,32)) and see if that works, test by passing in random data like model(torch.randn(32,32)). The error might be specific to some aspect of the model.

  3. can you take a look at the following test:

and see if that test runs successfully on your machine?