Pt2e_quantized_model failed in evaluating

I am a newbie to quantization, and I am trying quantize a model by following the tutorial ((prototype) PyTorch 2 Export Post Training Quantization — PyTorch Tutorials 2.4.0+cu121 documentation), I encountered an error in " Save and Load Quantized Model" section ((prototype) PyTorch 2 Export Post Training Quantization — PyTorch Tutorials 2.4.0+cu121 documentation), encountered a error in the final evaluation:

top1, top5 = evaluate(loaded_quantized_model, criterion, data_loader_test)

the error message is:
File “/data/theoWS/mgi-basecall/code/test_quant0b.py”, line 249, in
top1, top5 = evaluate(loaded_quantized_model, criterion, data_loader_test, _export=True)
File “/data/theoWS/mgi-basecall/code/test_quant0b.py”, line 90, in evaluate
output = model(image)
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/fx/graph_module.py”, line 738, in call_wrapped
return self._wrapped_call(self, *args, **kwargs)
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/fx/graph_module.py”, line 316, in call
raise e
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/fx/graph_module.py”, line 303, in call
return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1582, in _call_impl
args_kwargs_result = hook(self, args, kwargs) # type: ignore[misc]
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py”, line 600, in _fn
return fn(*args, **kwargs)
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/export/_unlift.py”, line 33, in _check_input_constraints_pre_hook
return _check_input_constraints_for_graph(
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/_export/utils.py”, line 155, in _check_input_constraints_for_graph
raise RuntimeError(
RuntimeError: Expected input at *args[0].shape[0] to be equal to 30, but got 50

Anybody has encountered this problem? What’s the reason, how to fix it? Thank you.

I tried to change both the train and eval batch size to 30
train_batch_size = 30
eval_batch_size = 30

but still got the error, and error message changed to
RuntimeError: Expected input at *args[0].shape[0] to be equal to 30, but got 20

So strange!

I also tried to write a new test code, base on the " Save and Load Quantized Model" section ((prototype) PyTorch 2 Export Post Training Quantization — PyTorch Tutorials 2.4.0+cu121 documentation), to load the quantized model (resnet18_pt2e_quantized.pth), but this time got another error - SerializeError, please see following error message:

Traceback (most recent call last):
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/_export/serde/serialize.py”, line 1618, in deserialize_graph
self.deserialize_node(serialized_node, target)
File "/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/export/serde/serialize.py", line 1690, in deserialize_node
raise SerializeError(
torch.export.serde.serialize.SerializeError: Unsupported target type for node Node(target=‘torch.ops.quantized_decomposed.quantize_per_tensor.default’, inputs=[NamedArgument(name=‘input’, arg=Argument(as_tensor=TensorArgument(name=‘x’))), NamedArgument(name=‘scale’, arg=Argument(as_float=0.01864933781325817)), NamedArgument(name=‘zero_point’, arg=Argument(as_int=-14)), NamedArgument(name=‘quant_min’, arg=Argument(as_int=-128)), NamedArgument(name=‘quant_max’, arg=Argument(as_int=127)), NamedArgument(name=‘dtype’, arg=Argument(as_scalar_type=2))], outputs=[Argument(as_tensor=TensorArgument(name=‘quantize_per_tensor’))], metadata={‘stack_trace’: ’ File “<eval_with_key>.2064”, line 7, in forward\n quantize_per_tensor_default = torch.ops.quantized_decomposed.quantize_per_tensor.default(arg0_1, 0.01864933781325817, -14, -128, 127, torch.int8); arg0_1 = None\n’, ‘nn_module_stack’: 'L__self
,torch.fx.graph_module.GraphModule.new..GraphModuleImpl’, ‘source_fn_stack’: ‘quantize_per_tensor_default,torch.ops.quantized_decomposed.quantize_per_tensor.default’, ‘torch_fn’: ‘quantize_per_tensor.default_1;OpOverload.quantize_per_tensor.default’}): <class ‘str’>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/data/theoWS/mgi-basecall/code/test_quant0btest.py”, line 124, in
loaded_quantized_ep = torch.export.load(pt2e_quantized_model_file_path)
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/export/init.py”, line 300, in load
return load(
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/_export/init.py”, line 336, in load
ep = deserialize(artifact, expected_opset_version)
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/_export/serde/serialize.py”, line 2359, in deserialize
.deserialize(
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/_export/serde/serialize.py”, line 2237, in deserialize
.deserialize(
File “/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/_export/serde/serialize.py”, line 1838, in deserialize
self.deserialize_graph(serialized_graph_module.graph)
File "/home/theo/anaconda3/envs/mgi/lib/python3.10/site-packages/torch/export/serde/serialize.py", line 1621, in deserialize_graph
raise SerializeError(
torch.export.serde.serialize.SerializeError: Failed deserializing node Node(target=‘torch.ops.quantized_decomposed.quantize_per_tensor.default’, inputs=[NamedArgument(name=‘input’, arg=Argument(as_tensor=TensorArgument(name=‘x’))), NamedArgument(name=‘scale’, arg=Argument(as_float=0.01864933781325817)), NamedArgument(name=‘zero_point’, arg=Argument(as_int=-14)), NamedArgument(name=‘quant_min’, arg=Argument(as_int=-128)), NamedArgument(name=‘quant_max’, arg=Argument(as_int=127)), NamedArgument(name=‘dtype’, arg=Argument(as_scalar_type=2))], outputs=[Argument(as_tensor=TensorArgument(name=‘quantize_per_tensor’))], metadata={‘stack_trace’: ’ File “<eval_with_key>.2064”, line 7, in forward\n quantize_per_tensor_default = torch.ops.quantized_decomposed.quantize_per_tensor.default(arg0_1, 0.01864933781325817, -14, -128, 127, torch.int8); arg0_1 = None\n’, ‘nn_module_stack’: 'L__self
,torch.fx.graph_module.GraphModule.new..GraphModuleImpl’, ‘source_fn_stack’: ‘quantize_per_tensor_default,torch.ops.quantized_decomposed.quantize_per_tensor.default’, ‘torch_fn’: ‘quantize_per_tensor.default_1;OpOverload.quantize_per_tensor.default’})

A strange error.

can you show all your code? I have never seen this error before

The code is here:

#https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html
import os
import sys
import time
import numpy as np

import torch
import torch.nn as nn
from torch.utils.data import DataLoader

import torchvision
from torchvision import datasets
from torchvision.models.resnet import resnet18
import torchvision.transforms as transforms

# Set up warnings
import warnings
warnings.filterwarnings(
    action='ignore',
    category=DeprecationWarning,
    module=r'.*'
)
warnings.filterwarnings(
    action='default',
    module=r'torch.ao.quantization'
)

# Specify random seed for repeatable results
_ = torch.manual_seed(191009)


class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self, name, fmt=':f'):
        self.name = name
        self.fmt = fmt
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

    def __str__(self):
        fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
        return fmtstr.format(**self.__dict__)


def accuracy(output, target, topk=(1,)):
    """
    Computes the accuracy over the k top predictions for the specified
    values of k.
    """
    with torch.no_grad():
        maxk = max(topk)
        batch_size = target.size(0)

        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
        correct = pred.eq(target.view(1, -1).expand_as(pred))

        res = []
        for k in topk:
            correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
            res.append(correct_k.mul_(100.0 / batch_size))
        return res


def evaluate(model, criterion, data_loader, _export=False):
    if _export:
        torch.ao.quantization.move_exported_model_to_eval(model)
    else:
        model.eval()

    top1 = AverageMeter('Acc@1', ':6.2f')
    top5 = AverageMeter('Acc@5', ':6.2f')
    cnt = 0
    with torch.no_grad():
        for image, target in data_loader:
            output = model(image)
            loss = criterion(output, target)
            cnt += 1
            acc1, acc5 = accuracy(output, target, topk=(1, 5))
            top1.update(acc1[0], image.size(0))
            top5.update(acc5[0], image.size(0))
    print('')

    return top1, top5

def load_model(model_file):
    model = resnet18(pretrained=False)
    state_dict = torch.load(model_file)
    model.load_state_dict(state_dict)
    model.to("cpu")
    return model

def print_size_of_model(model):
    if isinstance(model, torch.jit.RecursiveScriptModule):
        torch.jit.save(model, "temp.p")
    else:
        torch.jit.save(torch.jit.script(model), "temp.p")
    print("Size (MB):", os.path.getsize("temp.p")/1e6)
    os.remove("temp.p")

def prepare_data_loaders(data_path):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])
    dataset = torchvision.datasets.ImageNet(
        data_path, split="train", transform=transforms.Compose([
            transforms.RandomResizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ]))
    dataset_test = torchvision.datasets.ImageNet(
        data_path, split="val", transform=transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            normalize,
        ]))

    train_sampler = torch.utils.data.RandomSampler(dataset)
    test_sampler = torch.utils.data.SequentialSampler(dataset_test)

    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=train_batch_size,
        sampler=train_sampler)

    data_loader_test = torch.utils.data.DataLoader(
        dataset_test, batch_size=eval_batch_size,
        sampler=test_sampler)

    return data_loader, data_loader_test

data_path = '/ssd4t/dataset/imagenet'
saved_model_dir = 'data/'
float_model_file = 'resnet18_pretrained_float.pth'

train_batch_size = 30
eval_batch_size = 50

data_loader, data_loader_test = prepare_data_loaders(data_path)
example_inputs = (next(iter(data_loader))[0])
criterion = nn.CrossEntropyLoss()
float_model = load_model(saved_model_dir + float_model_file).to("cpu")
float_model.eval()

# create another instance of the model since
# we need to keep the original model around
model_to_quantize = load_model(saved_model_dir + float_model_file).to("cpu")

model_to_quantize.eval()

from torch._export import capture_pre_autograd_graph

example_inputs = (torch.rand(2, 3, 224, 224),)
exported_model = capture_pre_autograd_graph(model_to_quantize, example_inputs)
# or capture with dynamic dimensions
# from torch._export import dynamic_dim
# exported_model = capture_pre_autograd_graph(model_to_quantize, example_inputs, constraints=[dynamic_dim(example_inputs[0], 0)])

from torch.ao.quantization.quantize_pt2e import (
  prepare_pt2e,
  convert_pt2e,
)

from torch.ao.quantization.quantizer.xnnpack_quantizer import (
  XNNPACKQuantizer,
  get_symmetric_quantization_config,
)
quantizer = XNNPACKQuantizer()
quantizer.set_global(get_symmetric_quantization_config())

#quantizer.set_global(qconfig_opt)  # qconfig_opt is an optional quantization config
#    .set_object_type(torch.nn.Conv2d, qconfig_opt) # can be a module type
#    .set_object_type(torch.nn.functional.linear, qconfig_opt) # or torch functional op
#    .set_module_name("foo.bar", qconfig_opt)

prepared_model = prepare_pt2e(exported_model, quantizer)
print(prepared_model.graph)

#_export option is added to calibrate and evaluate due to the following error report:
# NotImplementedError: 
#
# Calling train() or eval() is not supported for exported models.
# Please call `torch.ao.quantization.move_exported_model_to_train(model)` (or eval) instead.

# If you cannot replace the calls to `model.train()` and `model.eval()`, you may override
# the behavior for these methods by calling `torch.ao.quantization.allow_exported_model_train_eval(model)`,
# which does the above automatically for you. Note that this has limited effect on switching
# behavior between train and eval modes, and should be used only for special ops such as dropout
# and batchnorm.
#

def calibrate(model, data_loader, _export=False):
    if _export:
        torch.ao.quantization.move_exported_model_to_eval(model)
    else:
        model.eval()
    with torch.no_grad():
        for image, target in data_loader:
            model(image)

calibrate(prepared_model, data_loader_test, _export=True)  # run calibration on sample data

quantized_model = convert_pt2e(prepared_model)
print(quantized_model)

# Baseline model size and accuracy
scripted_float_model_file = "resnet18_scripted.pth"

print("Size of baseline model")
print_size_of_model(float_model)

top1, top5 = evaluate(float_model, criterion, data_loader_test)
print("Baseline Float Model Evaluation accuracy: %2.2f, %2.2f"%(top1.avg, top5.avg))

# Quantized model size and accuracy
print("Size of model after quantization")
#print_size_of_model(quantized_model)

top1, top5 = evaluate(quantized_model, criterion, data_loader_test, _export=True)
print("[before serilaization] Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))

# 0. Store reference output, for example, inputs, and check evaluation accuracy:
example_inputs = (next(iter(data_loader))[0],)
ref = quantized_model(*example_inputs)
top1, top5 = evaluate(quantized_model, criterion, data_loader_test, _export=True)
print("[before serialization] Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))

# 1. Export the model and Save ExportedProgram
pt2e_quantized_model_file_path = saved_model_dir + "resnet18_pt2e_quantized.pth"
# capture the model to get an ExportedProgram
quantized_ep = torch.export.export(quantized_model, example_inputs)
# use torch.export.save to save an ExportedProgram
torch.export.save(quantized_ep, pt2e_quantized_model_file_path)


# 2. Load the saved ExportedProgram
loaded_quantized_ep = torch.export.load(pt2e_quantized_model_file_path)
loaded_quantized_model = loaded_quantized_ep.module()

# 3. Check results for example inputs and check evaluation accuracy again:
res = loaded_quantized_model(*example_inputs)
print("diff:", ref - res)

top1, top5 = evaluate(loaded_quantized_model, criterion, data_loader_test, _export=True)
print("[after serialization/deserialization] Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))```

Any update for this issue?