I am new to PyTorch Quantization.
I am not sure what’s the difference between pytorch-quantization and torch.ao.quantization?
from pytorch_quantization import nn as quant_nn
from pytorch_quantization import quant_modules
from pytorch_quantization import calib
from torch.ao.quantization import get_default_qconfig, QConfigMapping, default_qconfig
from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx, fuse_fx
its just our old directory. using ao.quantization is the recommended folder. You can directly replace all instances of torch.quantization with torch.ao.quantization. (note the reverse doesn’t work, most new functions/classes developed after the move weren’t duplicated to torch.quantization.
Our team started off doing only quantization and we were granted the top level torch.quantization folder, then we started doing pruning and sparsity and developed various numerical debugging tools. to avoid needing 5 top level folders, we moved everything to the torch.ao (architecture optimization) top level folder with quantization being a subfolder of it. We left the torch.quantization there but it just imports things from torch.ao.quantization to maintain BC, though its temporary and will eventually be removed.
Thank you very much for your reply. well I think what you describe is the difference between torch.quantization and torch.ao.quantization, which is similar in this answer. I am not sure that torch.quantization/torch.ao.quantization and pytoch-quantization whether be two different packages(the former come from pytorch, the letter come from nvidia?), they can both be used to quantization, just api not common?
Oh sorry, yeah i misunderstood. the pytorch-quantization tool in TensorRT/tools is something made by NVIDIA to simulate quantized numerics. It doesn’t look like its been updated recently so i’d assume its a bit outdated. However, the TensorRT library is useful for lowering to different backends and is something that our ao quantization APIs will be using under the hood in certain situations.
@HDCharles Is pytorch_quantization still the recommended toolkit for making use / converting models to use TensorRT (essentially take a transformer, quantize it and use int8 gemm ops on GPU at inference)? Or is there a better-supported TensorRT workflow?
I’ve tried asking the same in TensorRT repo, but it appeared that the NVidia folks are still suggesting using pytorch_quantization: [question] Difference of pytorch_quantization modules and torch.ao.quantization · Issue #3095 · NVIDIA/TensorRT · GitHub
depends on what you need I think, if you are using TensorRT maybe follow their suggestion is better,
if you are using pytorch ao quantization, you’ll need to find people from TensorRT to support lowering a model produced by ao quantization to TensorRT, we have some examples here: https://github.com/pytorch/TensorRT/blob/main/py/torch_tensorrt/fx/test/quant/test_quant_trt.py, but we are not really using this path right now and I’m not sure how fast the response will be if there is something wrong with the converter