Torch.export.onnx ignores attention_mask in HF Transformer models

aivatoglou · February 26, 2024, 2:21pm

Dear PyTorch community,

I’ve been attempting to export a HuggingFace Transformer model (deberta-v3-large) to the ONNX format, but I’ve encountered an issue where the export module consistently overlooks the attention_mask input. The inputs are input_ds, token_type_ids and the attention_mask. Initially, I speculated that this might be due to the do_constant_folding, as the attention_mask didn’t seem to influence the model’s decisions when using a dummy input. However, setting it to False didn’t resolve the issue, even when I used a dummy batch input in order to generate PAD tokens and consequently affect the attention mask.

So, my question is, if this is not a bug, what’s the logic behind this? Perhaps I am missing something.
How will the model handle batch inputs with PAD tokens at inference time?

Thank you.

Related issues:

github.com/pytorch/pytorch

Torch.onnx.export only exports 1 input to model when multiple inputs are specified

opened 07:16AM - 03 Sep 21 UTC

closed 05:21PM - 20 Oct 21 UTC

rohit497

module: onnx triaged onnx-needs-info

## 🐛 Bug I'm trying to export a deep neural network (AutoInt) that takes in 2… inputs (features and a mask) to onnx. However, when I run torch.onnx.export, it only exports a model with the first input (features) ## To Reproduce Model architecture ``` class AutoIntModel(nn.Module): def __init__(self, num_features, num_feature_dims, embedding_size, num_hid_layers, hidden_sizes, # Embedding layer params num_att_layers, num_att_heads, att_embedding_size, use_residual, # attention params feature_stds, feature_means, neg_sample_rate, p_dropout, device): super(AutoIntModel, self).__init__() assert(len(num_feature_dims) == 1) self.feature_means = feature_means self.feature_stds = feature_stds self.neg_sample_rate = neg_sample_rate self.num_features = num_features self.num_feature_dims = num_feature_dims[0] self.device = device self.att_layers = nn.ModuleList() # Converting our continuous numerical values into an embedding self.emb = DeepEmbeddingLayer(num_feature_dims[0], num_hid_layers, hidden_sizes, embedding_size, p_dropout) for i in range(num_att_layers): layer_input_size = embedding_size if i == 0 else num_att_heads*att_embedding_size self.att_layers.append(AttentionLayer(layer_input_size, att_embedding_size, num_att_heads, p_dropout, use_residual)) self.pred_layer = nn.Linear(num_features*num_att_heads*att_embedding_size, 1) self.dropout = nn.Dropout(p_dropout) self.relu = nn.ReLU() self.sigmoid = nn.Sigmoid() self.apply(self._init_weights) def _init_weights(self, module): if isinstance(module, nn.Linear): module.weight.data.normal_(mean=0.0, std=0.001) if module.bias is not None: module.bias.data.zero_() def forward(self, input_f, mask, labels=None): # Size [N, D*M] where D is number of features and M is number of feature dims features = torch.div((input_f - self.feature_means),self.feature_stds) # Add a third dimension to make it [N, D, 1] if self.num_feature_dims == 1: features = torch.unsqueeze(features, -1) # Make it [N, D/M, M] else: bs, d_m = features.shape new_size = (bs, int(d_m/self.num_feature_dims), self.num_feature_dims) if mask == [-5]: features = features.reshape(new_size) else: tmp_features = torch.tensor(np.zeros(new_size), dtype=torch.float) for i in range(len(mask)): m = mask[i] ind = torch.where(m != 0)[0] tmp_features[:, i, :] = features[:, ind] features = tmp_features features = features.to(self.device) embeddings = self.emb(features) for layer in self.att_layers: embeddings = layer(embeddings) embeddings = torch.flatten(embeddings, start_dim=1) p_click = self.sigmoid(self.pred_layer(embeddings)) output = (p_click,) if labels is not None: loss_func = BCELoss() loss = loss_func(p_click, labels) output = (loss, ) + output return output ``` Model initialization and saving ``` def verify_onnx_model(output_dir, model, dummy_in, score): test_session = onnxruntime.InferenceSession(os.path.join(output_dir, 'autoint.onnx')) test_inputs = {} cnt = 0 for inp in dummy_in: test_inputs[test_session.get_inputs()[cnt].name] = to_numpy(inp) cnt += 1 test_outs = test_session.run(None, test_inputs) print("pytorch output: ", to_numpy(score), " onnx: ", test_outs) assert np.absolute(to_numpy(score)-test_outs)<1e-6 print("test pass") # Assumes model has no embeddings as input def save_onnx_ini_model(model, dataloader, mask, output_dir): model.eval() mask = mask.to('cuda') epoch_iterator = tqdm(dataloader, desc="Iteration") for step, batch in enumerate(epoch_iterator): batch = tuple(t.to('cuda') for t in batch) # Batch[0] shape = [bs, 67], batch[1] shape = [bs, 1] inputs = {"input_f": batch[0], "mask": mask} score = model(**inputs) break dummy_in = (batch[0], mask) # Save .onnx model torch.onnx.export(model, dummy_in, os.path.join(output_dir, 'autoint.onnx'), opset_version=11, export_params=True, do_constant_folding=True, input_names=['input_f', 'mask'], output_names=['prob'], dynamic_axes={'input_f' : {0: 'batch'}, 'prob' : {0: 'batch'}}) # Verify .onnx model saved gives the same output as pytorch model verify_onnx_model(output_dir, model, dummy_in, score) model = AutoIntModel(num_value_features, num_feature_dims, args.embedding_size, args.num_hid_layers, args.hidden_sizes, args.num_att_layers, args.num_att_heads, args.att_embedding_size, args.use_residual, feature_stds, feature_means, neg_sample_rate, args.dropout, args.device) model.to(args.device) mask = np.ones((23, 3)) save_onnx_ini_model(model, dataloader, mask, './') ``` ## Expected behavior Model should have both features and mask in the .onnx version ## Environment - PyTorch Version (e.g., 1.0): 1.8.1 - OS (e.g., Linux): Azure Machine Learning Node - How you installed PyTorch (`conda`, `pip`, source): pip - Build command you used (if compiling from source): - Python version: 3.6.9 - CUDA/cuDNN version: 10.1 - GPU models and configuration: Tesla P40 24GB - Any other relevant information: cc @BowenBao @neginraoof

github.com/microsoft/onnxruntime

INVALID_ARGUMENT : Invalid Feed Input Name:token_type_ids

opened 08:26AM - 20 Jan 21 UTC

closed 12:12AM - 19 Mar 21 UTC

Zjq9409

more info needed

output_model_path = "chinese_roberta_l-12_H-768.onnx" session = onnxruntime.Inf…erenceSession(output_model_path, sess_options, providers=['CPUExecutionProvider']) # Warm up with one run. results = session.run(None, inputs_onnx) # Measure the latency. start = time.time() for _ in range(total_runs): results = session.run(None, inputs_onnx) end = time.time() print("ONNX Runtime cpu inference time for sequence length {} (model not optimized): {} ms".format(num_tokens, format((end - start) * 1000 / total_runs, '.2f'))) del session File "/work/tensor1.12/tf2.0/lib/python3.6/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 124, in run return self._sess.run(output_names, input_feed, run_options) onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid Feed Input Name:token_type_ids

aivatoglou · February 27, 2024, 10:46am

It appears that the token_type_ids parameter is being overlooked. The .onnx file produces identical output logits to the original .pt model when token_type_ids is ignored. While manually debugging, I observed that the token_type_ids do contain values to seperate between the two sentences (for a sentence pair classification task). It seems that DeBERTa does not utilize these tokens. I didn’t find any notes about this in the original GitHub repo (GitHub - microsoft/DeBERTa: The implementation of DeBERTa).