I have been trying to apply dynamic quantization to a model.
After performing the quantization, I try to revaluate the model to check for any modification in the prediction power.
The problem is that, after moving everything to device=‘cpu’ and running outputs = model(data), I receive RuntimeError: apply_dynamic is not implemented for this packed parameter type error.
Do you know if there are any steps that I am missing with the quantization or is this a possible bug of the code?
torch 2.1.2+cu121
torchaudio 2.1.2+cu121
torchprofile 0.0.4
torchvision 0.16.2+cu121
Here the error
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "CONDA_ENV\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "CONDA_ENV\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "CONDA_ENV\lib\site-packages\torch\ao\nn\quantized\dynamic\modules\linear.py", line 54, in forward
Y = torch.ops.quantized.linear_dynamic(
File "CONDA_ENV\lib\site-packages\torch\_ops.py", line 692, in __call__
return self._op(*args, **kwargs or {})
RuntimeError: apply_dynamic is not implemented for this packed parameter type
Here my original and quantized models:
model
CNNSpec(
(audio_conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
(audio_pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
(fc1): Linear(in_features=92736, out_features=64, bias=True)
(fc2): Linear(in_features=64, out_features=2, bias=True)
)
model_dynamic_quantization
CNNSpec(
(audio_conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
(audio_pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
(fc1): DynamicQuantizedLinear(in_features=92736, out_features=64, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
(fc2): DynamicQuantizedLinear(in_features=64, out_features=2, dtype=torch.qint8, qscheme=torch.per_tensor_affine)
)
Here the code:
import torch
import torch.nn as nn
import torch.nn.functional as F
def dynamic_quantization(model):
model_quantized = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
return model_quantized
model_dynamic_quantization = dynamic_quantization(model)
device = 'cpu'
model_dynamic_quantization.eval() # Set the model to evaluation mode
model.to(device)
with torch.no_grad(): # No need to track gradients during evaluation
for batch in test_dataloader:
data, labels = batch
data, labels = audio_data.to(device), labels.to(device)
outputs = model(data)
Versions
Collecting environment information…
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA RTX 2000 Ada Generation Laptop GPU
Nvidia driver version: 538.27
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=2500
DeviceID=CPU0
Family=198
L2CacheSize=11776
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2500
Name=13th Gen Intel(R) Core™ i7-13800H
ProcessorType=3
Revision=
Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] torch==2.1.2+cu121
[pip3] torchaudio==2.1.2+cu121
[pip3] torchprofile==0.0.4
[pip3] torchvision==0.16.2+cu121
[conda] numpy 1.26.3 pypi_0 pypi
[conda] torch 2.1.2+cu121 pypi_0 pypi
[conda] torchaudio 2.1.2+cu121 pypi_0 pypi
[conda] torchprofile 0.0.4 pypi_0 pypi
[conda] torchvision 0.16.2+cu121 pypi_0 pypi