Quantization of a vgg16 pretrained model

Siva6233 · November 27, 2023, 6:58pm

I tried quantizing the weights of a vgg16 pretrained model from torchvision.models .

The image shows what the models looks like after quantization.The feature weights of the model are in torch.qint8 format . I want to get inference using the quantized model but I am getting an runtime error.

RuntimeError: Unable to find an engine to execute this computation in Quantized Conv2D Cudnn.

all_labels_int8 =
all_predictions_int8 =
correct_int8 = 0
total_int8 = 0
start_time_int8 = time()
with torch.no_grad():
model_int8.eval()
for images, labels in testloader:
all_labels_int8.extend(labels.numpy())
#images, labels = images.to(device), labels.to(device)
images = torch.quantize_per_tensor(images, scale=1.0, zero_point=0, dtype=torch.qint8)
outputs = model_int8(images)
_, predicted = torch.max(outputs.data, 1)
total_int8 += labels.size(0)
correct_int8 += (predicted == labels).sum().item()
#predicted_tensor_cpu = predicted.to(‘cpu’)
all_predictions_int8.extend(predicted.numpy())
end_time_int8 = time()
print("Time: ",end_time_int8 - start_time_int8)

print(‘Accuracy achieved by the network on test images is: %d%%’ % (100 * correct_int8 / total_int8))

This is the code i used for the inference.

jcaip · November 28, 2023, 4:56pm

Hi @Siva6233 can you share the full stack trace please?

Siva6233 · November 28, 2023, 5:28pm

Hello @jcaip,
This is the complete stack trace.

RuntimeError Traceback (most recent call last)
Cell In[43], line 12
10 #images, labels = images.to(device), labels.to(device)
11 images = torch.quantize_per_tensor(images, scale=1.0, zero_point=0, dtype=torch.qint8)
—> 12 outputs = model_int8(images)
13 _, predicted = torch.max(outputs.data, 1)
14 total_int8 += labels.size(0)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don’t have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = ,

File /opt/conda/lib/python3.10/site-packages/torchvision/models/vgg.py:66, in VGG.forward(self, x)
65 def forward(self, x: torch.Tensor) → torch.Tensor:
—> 66 x = self.features(x)
67 x = self.avgpool(x)
68 x = torch.flatten(x, 1)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don’t have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = ,

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py:217, in Sequential.forward(self, input)
215 def forward(self, input):
216 for module in self:
→ 217 input = module(input)
218 return input

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don’t have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = ,

File /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py:469, in Conv2d.forward(self, input)
466 _reversed_padding_repeated_twice = _reverse_repeat_padding(self.padding)
467 input = F.pad(input, _reversed_padding_repeated_twice,
468 mode=self.padding_mode)
→ 469 return ops.quantized.conv2d(
470 input, self._packed_params, self.scale, self.zero_point)

File /opt/conda/lib/python3.10/site-packages/torch/_ops.py:502, in OpOverloadPacket.call(self, *args, **kwargs)
497 def call(self, *args, **kwargs):
498 # overloading call to ensure torch.ops.foo.bar()
499 # is still callable from JIT
500 # We save the function ptr as the op attribute on
501 # OpOverloadPacket to access it here.
→ 502 return self._op(*args, **kwargs or {})

RuntimeError: Unable to find an engine to execute this computation in Quantized Conv2D Cudnn

jcaip · November 30, 2023, 3:54pm

Thanks, this seems like a CUDA/CUDNN version error to me. Just making sure, can you run the model without quantization? If so can you share the result you get from running

wget https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

Siva6233 · November 30, 2023, 4:42pm

Hello @jcaip ,

Thank you for the reply.

I am able to run the model without quantization.

VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(fc1): Linear(in_features=25088, out_features=4096, bias=True)
(relu1): ReLU()
(dropout1): Dropout(p=0.5, inplace=False)
(fc2): Linear(in_features=4096, out_features=14, bias=True)
(output): LogSoftmax(dim=1)
)
)
This is the model I am using(without quantization).

I am using the model for flower classification and i just want the time take and accuracy of the model.

This is the output i get without quantization.

Collecting environment information…
PyTorch version: 2.0.0
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-5.15.133±x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: Tesla T4
GPU 1: Tesla T4

Nvidia driver version: 470.161.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU @ 2.00GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Stepping: 3
BogoMIPS: 4000.24
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 64 KiB (2 instances)
L1i cache: 64 KiB (2 instances)
L2 cache: 2 MiB (2 instances)
L3 cache: 38.5 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown

Versions of relevant libraries:
[pip3] flake8==6.1.0
[pip3] msgpack-numpy==0.4.8
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] onnx==1.15.0
[pip3] pytorch-ignite==0.4.13
[pip3] pytorch-lightning==2.1.1
[pip3] torch==2.0.0
[pip3] torchaudio==2.0.1
[pip3] torchdata==0.6.0
[pip3] torchinfo==1.8.0
[pip3] torchmetrics==1.2.0
[pip3] torchtext==0.15.1
[pip3] torchvision==0.15.1
[conda] cudatoolkit 11.8.0 h4ba93d1_12 conda-forge
[conda] magma-cuda118 2.6.1 1 pytorch
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] msgpack-numpy 0.4.8 pypi_0 pypi
[conda] numpy 1.26.1 pypi_0 pypi
[conda] pytorch-ignite 0.4.13 pypi_0 pypi
[conda] pytorch-lightning 2.1.1 pypi_0 pypi
[conda] torch 2.0.0 pypi_0 pypi
[conda] torchaudio 2.0.1 pypi_0 pypi
[conda] torchdata 0.6.0 pypi_0 pypi
[conda] torchinfo 1.8.0 pypi_0 pypi
[conda] torchmetrics 1.2.0 pypi_0 pypi
[conda] torchtext 0.15.1 pypi_0 pypi
[conda] torchvision 0.15.1 pypi_0 pypi

Thank you