Torch.export.onnx ignores attention_mask in HF Transformer models

Dear PyTorch community,

I’ve been attempting to export a HuggingFace Transformer model (deberta-v3-large) to the ONNX format, but I’ve encountered an issue where the export module consistently overlooks the attention_mask input. The inputs are input_ds, token_type_ids and the attention_mask. Initially, I speculated that this might be due to the do_constant_folding, as the attention_mask didn’t seem to influence the model’s decisions when using a dummy input. However, setting it to False didn’t resolve the issue, even when I used a dummy batch input in order to generate PAD tokens and consequently affect the attention mask.

So, my question is, if this is not a bug, what’s the logic behind this? Perhaps I am missing something.
How will the model handle batch inputs with PAD tokens at inference time?

Thank you.

Related issues:

It appears that the token_type_ids parameter is being overlooked. The .onnx file produces identical output logits to the original .pt model when token_type_ids is ignored. While manually debugging, I observed that the token_type_ids do contain values to seperate between the two sentences (for a sentence pair classification task). It seems that DeBERTa does not utilize these tokens. I didn’t find any notes about this in the original GitHub repo (GitHub - microsoft/DeBERTa: The implementation of DeBERTa).