Error in FX Graph Mode Quantization

Taehee_Jeong · June 14, 2023, 2:17am

Hello,
I tried to quantize opt model using FX Graph Mode Quantization.

model_fp is huggingface opt model.

I tried to follow up the following quantization tutorial.

model_to_quantize = copy.deepcopy(model_fp)
model_to_quantize.eval()
qconfig_mapping = QConfigMapping().set_global(torch.ao.quantization.default_dynamic_qconfig)

a tuple of one or more example inputs are needed to trace the model

example_inputs = (input_fp32)

prepare

model_prepared = quantize_fx.prepare_fx(model_to_quantize, qconfig_mapping, example_inputs)

I got the following error.
Could you help me how to fix this error?

File ~/transformers/models/opt/modeling_opt.py:625 in forward
raise ValueError(“You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time”)

ValueError: You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time

HDCharles · June 15, 2023, 1:57am

can you provide a repro or more details, that error seems unrelated to quantization given that its occuring in transformers/models/opt/modeling_opt directory

jerryzh168 · June 23, 2023, 4:23am

I haven’t tried this, but I think you’ll need to use huggingface symbolic tracer to trace the model first, and then try quantizing with our API: transformers/src/transformers/utils/fx.py at main · huggingface/transformers · GitHub