Convert PyTorch model to Onnx format (inference not same)

I have tested with torch.ones(1,3,224,224){input} and the model works but with my own inputs and same preprocessing used in training the onnx model produces different outputs from the torch model.
Code to convert the model to onnx format

import onnx

from onnx_tf.backend import prepare

model = model

model.load_state_dict(torch.load(PATH, map_location=torch.device('cpu')))

print("Model is loaded")

model.eval()

#Export model to ONNX format

x = torch.randn(1, 3, 224, 224).to(device)

torch.onnx.export(model, 

                  x, 

                  "vgg16.onnx", 

                  opset_version=10,

                  do_constant_folding=True,

                  export_params=True,

                  input_names = ["input"],

                  output_names =["output"],

                  verbose=True,

                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes

                                'output' : {0 : 'batch_size'}}

                  )

Code for inference in onnx:

ort_session = ort.InferenceSession("/content/vgg16.onnx")

x = test_dataset.load_img(1).transpose(2,0,1)

plt.imshow(test_dataset.load_img(1))

def to_numpy(tensor):

    if tensor.requires_grad:

        return tensor.detach().cpu().numpy()

    return tensor.cpu().numpy()

outputs = ort_session.run(

    None,

    {"input": x[None].astype("float32")},

)

# compare ONNX Runtime and PyTorch results

np.testing.assert_allclose(torch_out.detach().cpu(), 

                           outputs[0], 

                           rtol=1e-03, 

                           atol=1e-05)

print("Exported model has been tested with ONNXRuntime, and the result looks good!")

AssertionError:
Not equal to tolerance rtol=0.001, atol=1e-05

Mismatched elements: 36 / 36 (100%)
Max absolute difference: 0.17844993
Max relative difference: 0.8394638
x: array([[0.171307, 0.180779, 0.179579, 0.225714, 0.232095, 0.220075,
0.443109, 0.470671, 0.488748, 0.538834, 0.530197, 0.539141,
0.038368, 0.028497, 0.096283, 0.401647, 0.279558, 0.50373 ,…
y: array([[0.338318, 0.345975, 0.340239, 0.349426, 0.356006, 0.352419,
0.478905, 0.489058, 0.498031, 0.514408, 0.505635, 0.498025,
0.199641, 0.17751 , 0.274733, 0.458645, 0.396497, 0.490221,…

Can you try running model.eval() before running inference of pytorch model

1 Like

I have applied model.eval() before running the inference of the PyTorch model. Still getting this output difference.

I am using the below versions
Torch version: 1.10.0+cu111
Onnx version: 1.11.0

Are you tracing your model and if so do you have a data-dependent control flow in the forward, which would be traced to a static execution?
Would scripting the model instead help?

1 Like

I am not tracing my model. I am using a pretrained vgg16, finetuning it and saving the model to a .pth file → .onnx

My onnx graph looks like this
image

torch.onnx.export would trace the model as described in the docs:

Exports a model into ONNX format. If model is not a torch.jit.ScriptModule nor a torch.jit.ScriptFunction, this runs model once in order to convert it to a TorchScript graph to be exported (the equivalent of torch.jit.trace()). Thus this has the same limited support for dynamic control flow as torch.jit.trace().

In any case, I cannot reproduce the issue and get the same results up to the expected difference due to the limited floating point precision:

# setup
model = models.vgg16().eval()
x = torch.randn(1, 3, 224, 224)

# PyTorch reference output
out = model(x)

# export to ONNX
torch.onnx.export(
    model,
    x,
    'vgg.onnx',
    input_names = ["input"],
    output_names =["output"],
    verbose=True,
    dynamic_axes={'input' : {0 : 'batch_size'},
                  'output' : {0 : 'batch_size'}}
)

# ONNX reference output
ort_session = ort.InferenceSession("vgg.onnx")
outputs = ort_session.run(
    None,
    {"input": x.numpy()},
)

# compare ONNX Runtime and PyTorch results
print(np.max(np.abs((out.detach().numpy() - outputs))))
# > 8.6426735e-07
2 Likes

Thanks for the help. II managed to resolve the issue. It turns out that it was something with the preprocessing because when I use the PyTorch data loader image to evaluate an image in onnx it works but with my own custom data loader I build to load data before being passed into the torch data loader seems to cause this problem somehow

hey @ptrblck – i’m having a similar issue and I’ve used the above code to export with no success.

This is how you can reproduce:

from transformers import AutoModel
import torch
import onnxruntime as ort


def to_numpy(tensor):
    return tensor.cpu().numpy()


model_name = "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext"

model = AutoModel.from_pretrained(model_name)
model.eval()


ids = torch.randint(low=0, high=30000, size=(1, 128)).type(torch.LongTensor)
mask = torch.ones((1, 128)).type(torch.LongTensor)

torch.onnx.export(
    model,
    (ids, mask),
    "onnx/test.onnx",
    opset_version=13,
    input_names=["ids", "mask"],
    output_names=["output"],
    export_params=True,
    dynamic_axes={
        "ids": {0: "batch_size"},
        "mask": {0: "batch_size"},
        "output": {0: "batch_size"},
    },
)

# get onnx_model outputs
onnx_model = ort.InferenceSession(
    f"onnx/test.onnx", providers=["CPUExecutionProvider"]
)
onnx_input = {
    "ids": to_numpy(ids),
    "mask": to_numpy(mask),
}
onnx_x = onnx_model.run(None, onnx_input)  # [(1,128,768), (1,768)]

# get torch model outputs
x = model(ids, mask)  # [(1,128,768), (1,768)]

# check difference
delta = x[0].shape[0] - onnx_x[0][0]
print(
    delta.min(), delta.max(), delta.mean(), delta.std()
)  # difference in the tensors
assert x[0][0] == onnx_x[0][0]

the output of that script is -5.3224783 14.314435 1.0189513 0.5448166 which shows that the tensors are on average, 1.01 units apart which impacts my downstream task. Do you know what could be the issue?

versions:

transformers       4.20.1
torch              1.12.0
onnx               1.11.0
onnxruntime        1.11.1
onnxruntime-tools  1.7.0 

Hi there! I know the post was old but maybe someone needs more directions, I was able to solve this difference, after several changes my code sample is:

import onnxruntime

dummy_input = Variable(torch.randn(1, 3, 224, 224)).to("cuda")
#include your pytorch model creation or loading here
model.to("cuda")
#save to onnx
torch.onnx.export(model,               # model being run
                  dummy_input,                         # model input (or a tuple for multiple inputs)
                  "model.onnx",   # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=11,          # the ONNX version to export the model to
                  do_constant_folding=True ) # whether to execute constant folding for optimization

torch_out=model(dummy_input)
ort_session = onnxruntime.InferenceSession("model.onnx")

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(dummy_input)}
ort_outs = ort_session.run(None, ort_inputs)

# compare ONNX Runtime and PyTorch results
np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)

print("Exported model has been tested with ONNXRuntime, and the result looks good!")

Exported model has been tested with ONNXRuntime, and the result looks good!