Converting BERT models to ONNX

Hi there,

I am trying to convert a BERT model to ONNX. However, I think there is some discrepancy in the ONNX conversion module. I ran the sample conversion presented here on the website: (optional) Exporting a Model from PyTorch to ONNX and Running it using ONNX Runtime — PyTorch Tutorials 1.10.1+cu102 documentation

I did not find the test to succeed and there is a significant numerical error between the PyTorch model and the ONNX version of the model generated by the conversion.

Python (3.8.10)
ONNX (1.10.2)
NumPy (1.21.2)
ONNXRuntime (1.10.0)

Update: I have the latest branch of PyTorch (built from source) and ONNX (built from source).

Here’s a sample output when I run the code:

Traceback (most recent call last):
  File "sample_convert.py", line 96, in <module>
    np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)
  File "/usr/local/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 1531, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.001, atol=1e-05

Mismatched elements: 451416 / 451584 (100%)
Max absolute difference: 7.256867
Max relative difference: 317452.75
 x: array([[[[ 0.178333,  0.240012,  0.491639, ...,  0.368835,  0.33173 ,
           0.327067],
         [ 0.153973,  0.266167,  0.592167, ...,  0.480333,  0.449961,...
 y: array([[[[ 0.620332,  0.832479,  1.062318, ..., -0.410807, -0.471132,
          -0.290673],
         [ 0.496837,  0.832677,  1.17443 , ..., -0.337473, -0.433076,...