Quantization - RuntimeError: apply_dynamic is not implemented for this packed parameter type

I have been trying to apply dynamic quantization to a model.
After performing the quantization, I try to revaluate the model to check for any modification in the prediction power.
The problem is that, after moving everything to device=‘cpu’ and running outputs = model(data), I receive RuntimeError: apply_dynamic is not implemented for this packed parameter type error.
Do you know if there are any steps that I am missing with the quantization or is this a possible bug of the code?

torch 2.1.2+cu121
torchaudio 2.1.2+cu121
torchprofile 0.0.4
torchvision 0.16.2+cu121

Here the error

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "CONDA_ENV\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "CONDA_ENV\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "CONDA_ENV\lib\site-packages\torch\ao\nn\quantized\dynamic\modules\linear.py", line 54, in forward
    Y = torch.ops.quantized.linear_dynamic(
  File "CONDA_ENV\lib\site-packages\torch\_ops.py", line 692, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: apply_dynamic is not implemented for this packed parameter type

Here my original and quantized models:

model
CNNSpec(
  (audio_conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  (audio_pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=92736, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=2, bias=True)
)

model_dynamic_quantization
CNNSpec(
  (audio_conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  (audio_pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  (fc1): DynamicQuantizedLinear(in_features=92736, out_features=64, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
  (fc2): DynamicQuantizedLinear(in_features=64, out_features=2, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
) 

Here the code:

import torch
import torch.nn as nn
import torch.nn.functional as F

def dynamic_quantization(model):
    model_quantized = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
    return model_quantized

model_dynamic_quantization = dynamic_quantization(model)

device = 'cpu'
model_dynamic_quantization.eval()  # Set the model to evaluation mode
model.to(device)
with torch.no_grad():  # No need to track gradients during evaluation
    for batch in test_dataloader:
                data, labels = batch
                data, labels = audio_data.to(device), labels.to(device)
                outputs = model(data)

Versions

Collecting environment information…
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA RTX 2000 Ada Generation Laptop GPU
Nvidia driver version: 538.27
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2500
DeviceID=CPU0
Family=198
L2CacheSize=11776
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2500
Name=13th Gen Intel(R) Core™ i7-13800H
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] torch==2.1.2+cu121
[pip3] torchaudio==2.1.2+cu121
[pip3] torchprofile==0.0.4
[pip3] torchvision==0.16.2+cu121
[conda] numpy 1.26.3 pypi_0 pypi
[conda] torch 2.1.2+cu121 pypi_0 pypi
[conda] torchaudio 2.1.2+cu121 pypi_0 pypi
[conda] torchprofile 0.0.4 pypi_0 pypi
[conda] torchvision 0.16.2+cu121 pypi_0 pypi

can you try moving the model to CPU before doing quantization to run all steps on the same device and see if that solves the error? It looks like the model is trying to run the cudnn apply_dynamic op.