Input image with int?

timosy · July 6, 2022, 5:05am

When trainig a model, image values are converted to float via normalization using usually,

    normalize = torchvision.transforms.Normalize(
        mean=[xx,xx,xx], std=[xx,xx,xx]
    )
    train_dataset = torchvision.datasets.ImageFolder(TRAIN_DIR,
            torchvision.transforms.Compose([
                torchvision.transforms.RandomHorizontalFlip(), # random flip
                torchvision.transforms.RandomCrop(image_size), # cropping image
                torchvision.transforms.ToTensor(),
                normalize,
            ]))

So, the input is float, here. I have just a naive question, Is it possible to input integer to the model to make the train/inference speed faster more? (Is it possible to do it with quantized INT8 model?)

I found this post:

If you are not going to train your neural network and you want to run on CPU, then datatypes like torch.uint8 may help you as you can achieve more instructions per time interval (i.e. your application should run faster).

May be INT8 input is possible ?
Best regards,

Vasiliy_Kuznetsov · July 6, 2022, 1:15pm

Yes, this is possible. Quantization — PyTorch master documentation describes quantization support in PyTorch. Usually people leave their inputs in fp32 and let the framework quantize most of the layers.

If you would like to provide a quantized integer input, you can use the set_input_quantized_indexes function on prepare_fx’s prepare_custom_config object: prepare_fx — PyTorch master documentation

timosy · July 7, 2022, 12:23am

Thank you for the comment!
it’s solved.

timosy · July 9, 2022, 3:57pm

where can I get torch 1.13 to use PREPARE_FX: prepare_fx — PyTorch master documentation
pip can not install it?

ptrblck · July 10, 2022, 12:59am

You can install the nightly binaries by selecting it on the website.
1.13.0 is not a thing yet and the docs point to the current upstream/nightly build.

timosy · July 11, 2022, 3:58am

Following tutiral of " (PROTOTYPE) FX GRAPH MODE POST TRAINING STATIC QUANTIZATION", I defined “PrepareCustomConfig” and tried to use prepare_fx so that I make inference faster by inputing data of integer.

# 5. Prepare the Model for Post Training Static Quantization
#prepare_fx folds BatchNorm modules into previous Conv2d modules,
# and insert observers in appropriate places in the model.

input_fp32 = torch.randn(4, 1, 224, 224)
example_inputs = (input_fp32)

from torch.ao.quantization import QConfigMapping
from torch.ao.quantization.fx.custom_config import PrepareCustomConfig

qconfig = get_default_qconfig("fbgemm")
qconfig_mapping = QConfigMapping().set_global(qconfig)
prepare_custom_config = PrepareCustomConfig() 
prepare_custom_config.set_input_quantized_indexes([0])  
prepare_custom_config.set_output_quantized_indexes([0])

prepared_model = prepare_fx(model_to_quantize, qconfig_mapping, example_inputs, prepare_custom_config)

def calibrate(model, data_loader):
    model.eval()
    with torch.no_grad():
        for image, target in data_loader:
            model(image)
calibrate(prepared_model, data_loader_test)  # run calibration on sample data

# 7. Convert the Model to a Quantized Model
# convert_fx takes a calibrated model and produces a quantized model.
quantized_model = convert_fx(prepared_model)

But, when inputing (image) data, I encountered error.

    with torch.no_grad():
        for image, target in data_loader:
            #image = image.to(torch.uint8)
            print(f"image.dtype = {image.dtype}, image.shap = {image.shape}")
            output = model(image)

image.dtype = torch.float32, image.shap = torch.Size([50, 3, 224, 224])

Traceback (most recent call last):
  File "/mnt/d/v1.1.0/network/test_FX-GRAPH-MODE-QUANTIZATION/test.py", line 256, in <module>
    top1, top5 = evaluate(quantized_model, criterion, data_loader_test)
  File "/mnt/d/v1.1.0/network/test_FX-GRAPH-MODE-QUANTIZATION/test.py", line 102, in evaluate
    output = model(image)
  File "/mnt/d/v1.1.0/network/venv_torch1.13/lib/python3.9/site-packages/torch/fx/graph_module.py", line 652, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
  File "/mnt/d/v1.1.0/network/venv_torch1.13/lib/python3.9/site-packages/torch/fx/graph_module.py", line 277, in __call__
    raise e
  File "/mnt/d/v1.1.0/network/venv_torch1.13/lib/python3.9/site-packages/torch/fx/graph_module.py", line 267, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
  File "/mnt/d/v1.1.0/network/venv_torch1.13/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1186, in _call_impl
    return forward_call(*input, **kwargs)
  File "<eval_with_key>.10", line 5, in forward
  File "/mnt/d/v1.1.0/network/venv_torch1.13/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1186, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/d/v1.1.0/network/venv_torch1.13/lib/python3.9/site-packages/torch/nn/intrinsic/quantized/modules/conv_relu.py", line 92, in forward
    return torch.ops.quantized.conv2d_relu(
  File "/mnt/d/v1.1.0/network/venv_torch1.13/lib/python3.9/site-packages/torch/_ops.py", line 148, in __call__
    return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'quantized::conv2d_relu.new' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::conv2d_relu.new' is only available for these backends: [Conjugate, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

This is still unavailable?
This is just question to understand the situation of Pytorch
Best regards

Vasiliy_Kuznetsov · July 11, 2022, 1:48pm

Your example should work if you remove the prepare_custom_config.set_input_quantized_indexes([0]) line. You are passing in images which are tensors with dtype float32, so there isn’t a need to specify that your model input is quantized.

If you set prepare_custom_config.set_input_quantized_indexes([0]), you are telling the workflow that you intend to pass in tensors with dtype torch.quint8. This is an extra optimization for the cases where the input tensors coming into the model are already quantized, which does not seem to be the case in your example.

timosy · July 12, 2022, 12:33am

Thanks for the advice, certaintly, it worked when I commented out both “set_input_quantized” and “set_output_quantized”.

But, actually, what I’d like to test was to test whether inference speed gets fast with “input data of int + model of int8” or not. A test with “input data of float + model of int8”, which is genral usage, is not what I’d liek to test…

To test the formaer one, I put “set_input_quantized” in the option. What I had to do was conversio of float32 to torch.quint8 of data?

Billy_Hsueh · February 29, 2024, 2:13pm

Does a model using eager mode for QAT have similar functionality?

Currently, I have a quantized model where image inputs are normalized from [0, 255] to [0.0, 1.0], and then quantized to [0, 18] (scale: 0.05404205247759819).

Is there any way to ensure that the input tensor is quantized to [0, 255]?