Running ONNX model with the Caffe2 backend

Luca_Pamparana · January 9, 2019, 2:59pm

I am trying to upgrade my existing pytorch 0.4 model to 1.0 and am attempting to use the Caffe2 backend to run the models in production on the GPU.

So, what I did is as follows:

# Export my model to ONNX
torch.onnx._export(model, args, "test.pnnx", export_params=True)

import caffe2.python.onnx.backend as onnx_caffe2_backend
# Load the ONNX model from file.
model = onnx.load("test.onnx")
# We will run our model on the GPU with ID 3.
rep = onnx_caffe2_backend.prepare(model, device="CUDA:3")

outputs = rep.run(np.random.randn(1, 3, 128, 64).astype(np.float32))

Now, I have a couple of questions about this:

1: What if my input data already resides on the GPU? How can I pass that data to the model rather than moving it to CPU with numpy and then passing it to the executor? I tried tjhe following:

args = torch.randn(1, 3, 128, 64, dtype=torch.float32).cuda(3)
print(args.dtype)
outputs = rep.run(args)

This prints torch.float32. However, I get the error:

if arr.dtype == np.dtype('float64'):
TypeError: data type not understood

I am not sure why the array is being interpreted as a double array.

2: I noticed that the call to the prepare is rather slow. So, it seems my old pytorch code is faster than running it on the backend. I will do more exhaustive timing comparisons but is this the right way to export the model and have it running on the GPU with pytorch/onnx/caffe?
So, regarding this point. If I call prepare without the GPU option, the call is fast but specifying GPU with onnx_caffe2_backend.prepare(model, device="CUDA:3") is very slow.

My system is using

python 3.6.8
pytorch 1.0.0
onnx 1.3.0
ubuntu 16.04
cuda 9.0

Mdhvince · September 12, 2019, 10:18am

Hi,
Have you fixed this ? I have the same problem, very slow when using caffe2 backend on GPU. And I think it’s because the input is on CPU .numpy(), so the question can be how to move the input on gpu in order to use it with caffe2 backend.

with a similar code I have this result about inference speed:
pytorch CPU (3s) > onnx/caffe2 backend GPU (600ms) > onnx/caffe2 backend CPU (200ms)> pytorch GPU (50ms).

I would expect onnx/caffe2 backend GPU to have a better speed but it is not the case.