Error in converting ssdlite object detection to onnx

TorchingAround · September 30, 2021, 10:35am

I tried to convert the ssdlite320_mobilenet_v3_large to onnx
the problem is, when I export the model as onnx i got no error at all, and when checking the validity of the model using
onnx.checker.check_model(onnx_model)
and

ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(dummy)}
ort_outs = ort_session.run(None, ort_inputs)
# compare ONNX Runtime and PyTorch results
np.testing.assert_allclose(np.array(torch_out[0]['scores'].detach()), np.array(ort_outs[1]), rtol=1e-03, atol=1e-05)
i got no error as well, as long as I use the same dummy tensor value.
there i thought it works well. 
so I tried to predict another randn tensor with the same size

testTorch = torch.randn(1, 3, 320, 320)
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(testTorch)}
ort_outs = ort_session.run(None, ort_inputs) #this lines gives the error]

but it gives me error saying:
Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running TopK node. Name:‘TopK_2630’ Status Message: k argument [300] should not be greater than specified axis dim value [286]
if i changed the tensor to the dummy one it doesn’t give the error.

THE WHOLE CODE:

!pip3 install onnx onnxruntime
import torch
from torch import nn
import torchvision
import numpy as np
import onnx

mobilenetssd = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained= True,pretrained_backbone=True)
mobilenetssd.eval()

dummy = torch.randn(1, 3, 320, 320)
torch_out = mobilenetssd(dummy.detach())
torch.onnx.export(mobilenetssd, # model being run
    dummy, # model input (or a tuple for multiple inputs)
    "mobilenet.onnx", # where to save the model (can be a file or file-like object)
    export_params=True, # store the trained parameter weights inside the model file
    opset_version=11, # the ONNX version to export the model to
    do_constant_folding=True, # whether to execute constant folding for optimization
    input_names = ['input'], # the model's input names
    output_names = ['output'], # the model's output names
    dynamic_axes={'input' : {0 : 'batch_size'}, # variable length axes
    'output' : {0 : 'batch_size'}}
)

onnx_model = onnx.load("mobilenet.onnx")
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession("mobilenet.onnx")

def  to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()
    
# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(dummy)}
ort_outs = ort_session.run(None, ort_inputs)
# compare ONNX Runtime and PyTorch results
np.testing.assert_allclose(np.array(torch_out[0]['scores'].detach()), np.array(ort_outs[1]), rtol=1e-03, atol=1e-05)
print("Exported model has been tested with ONNXRuntime, and the result looks good!")

testTorch = torch.randn(1, 3, 320, 320)
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(testTorch)}
ort_outs = ort_session.run(None, ort_inputs) #THE ERROR IS HERE WHEVER I USE ANOTHER TENSOR ATHER THAN THE DUMMY ONE

TorchingAround · September 30, 2021, 10:52am

i tried it once more on google colab. it gives to error again.
but when i make a new cell and use this code
it gives the same error again. i don’t understand what’s happening.

testTorch = torch.randn(1, 3, 320, 320)
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(testTorch)}
ort_outs = ort_session.run(None, ort_inputs)

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running TopK node. Name:‘TopK_3231’ Status Message: k argument [300] should not be greater than specified axis dim value [293]

ptrblck · September 30, 2021, 9:00pm

I’m not familiar with the ONNX export of this model, but note that SSD could be using a data-dependent processing based on the input. I.e. the failing operation might assume that e.g. 300 “candidates” are found at least and select the topK from them. However, since you are using a random input I guess this particular tensor/list might be smaller.

goksinan · August 22, 2022, 4:48pm

I am facing the same problem even though I used an actual test image during onnx export. The pytorch model works fine. No problem during conversion. But when using ORT, I always get the error regardless of the input image type. I wonder if we need to set any parameters while building the pytorch model for evaluation.
[E:onnxruntime:, sequential_executor.cc:368 onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running TopK node. Name:‘TopK_1254’ Status Message: k argument [4] should not be greater than specified axis dim value [3]