Exporting a XNNPACKQuantized graph using torch.export into ExportedProgram is failing with _dynamo.exc.TorchRuntimeError aten.gather.default Expected dtype int64 for index, but got torch.float32

Hello All,

I’m new to Executorch and was trying to export mDeBERTA model to run on Edge devices.
I’m able to export mDeBERTa model to ExportedGraph and then use the XNNPACKQuantizer to get a quantized model from calibrated/trained model. But when I’m trying to export this quantized graph to ExportedProgram format it is failing with the following error

torch._dynamo.exc.TorchRuntimeError: Failed running call_function aten.gather.default(*(FakeTensor(…, size=(12, 28, 512)), -1, FakeTensor(…, size=(12, 28, 28))), **{}):
gather(): Expected dtype int64 for index, but got torch.float32

Model Name: “MoritzLaurer/mDeBERTa-v3-base-mnli-xnli”

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

#from torch.export import export_for_training
#from torch._export import exported_for_training
from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
    get_symmetric_quantization_config,
    XNNPACKQuantizer,
)

# For Executorch 
from torch.export import export, ExportedProgram
from executorch.exir import to_edge

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
premise = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU"
hypothesis = "Emmanuel Macron is the President of France"
model_name = "MoritzLaurer/mDeBERTa-v3-base-mnli-xnli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, _fast_init=False, torchscript=True)
input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")

input_shape = [251000,768]
input_data = input["input_ids"].to(dtype=torch.int64)
input_data = input_data.type(torch.int64)
print("Input Data: ", input_data, " Datatype: ", type(input_data))
aten_dialect: ExportedProgram = export(model, (input_data,))

print("Got Aten operation")

quantizer = XNNPACKQuantizer().set_global(get_symmetric_quantization_config())
#prepared_graph = prepare_pt2e(aten_dialect, quantizer)
exported_model = capture_pre_autograd_graph(model, (input_data,))
prepared_graph = prepare_pt2e(exported_model, quantizer)
converted_graph = convert_pt2e(prepared_graph)
print("Quantized Graph")


print("Input Data: ", input_data, " Datatype: ", input_data.type())
inpdatai64 = torch.tensor(input_data.tolist(), dtype=torch.int64)
print("Input Data Datatype: ", inpdatai64.type())

# ISSUE: with gather operation
aten_dialect1: ExportedProgram = export(converted_graph, (inpdatai64,))
print("ATen Dialect Graph")```

Note that I have tried with dynamic dimensions with export as well but encountering same error. What I notice is that it is creating a FakeTensor while tracing for performing the export operation. 

Is there any known issue with FakeTensor while executing gather operation or is there any other way I need to pass input tuple (instead of (inpdatai64,)) to export function? Is there anything that I am missing leading to this failure of export.

Hi @dcpiyush, sorry for the delay. Could you please file a github issue with instructions for reproducing the problem, along with information about your development environment?