I am trying to upgrade my existing pytorch 0.4 model to 1.0 and am attempting to use the Caffe2 backend to run the models in production on the GPU.
So, what I did is as follows:
# Export my model to ONNX
torch.onnx._export(model, args, "test.pnnx", export_params=True)
import caffe2.python.onnx.backend as onnx_caffe2_backend
# Load the ONNX model from file.
model = onnx.load("test.onnx")
# We will run our model on the GPU with ID 3.
rep = onnx_caffe2_backend.prepare(model, device="CUDA:3")
outputs = rep.run(np.random.randn(1, 3, 128, 64).astype(np.float32))
Now, I have a couple of questions about this:
1: What if my input data already resides on the GPU? How can I pass that data to the model rather than moving it to CPU with numpy and then passing it to the executor? I tried tjhe following:
args = torch.randn(1, 3, 128, 64, dtype=torch.float32).cuda(3)
print(args.dtype)
outputs = rep.run(args)
This prints torch.float32
. However, I get the error:
if arr.dtype == np.dtype('float64'):
TypeError: data type not understood
I am not sure why the array is being interpreted as a double
array.
2: I noticed that the call to the prepare
is rather slow. So, it seems my old pytorch code is faster than running it on the backend. I will do more exhaustive timing comparisons but is this the right way to export the model and have it running on the GPU with pytorch/onnx/caffe?
So, regarding this point. If I call prepare
without the GPU option, the call is fast but specifying GPU with onnx_caffe2_backend.prepare(model, device="CUDA:3")
is very slow.
My system is using
- python 3.6.8
- pytorch 1.0.0
- onnx 1.3.0
- ubuntu 16.04
- cuda 9.0