ONNX -> Caffe2 for Recurrent Models issue

lysuhin · July 27, 2018, 9:46am

Hello!

My case is to convert some model containing Recurrent module (GRU) from pytorch to caffe2.
I see that the only way of doing that is by using onnx.

My smallest example is the following:

import sys
import numpy as np

import torch
import torch.onnx

import onnx

import caffe2.python.onnx.backend as backend
from caffe2.python.onnx.backend import Caffe2Backend


def main():

	# step 0: prepare model which takes sequences of size 8 as input and has 1 forward hidden layer of size 4.

	model_pytorch = torch.nn.GRU(input_size=8,
								 hidden_size=4,
								 num_layers=1)
	model_pytorch.eval()

	x = torch.randn(2, 1, 8)  # seq_len x batch x input_size
	h = torch.zeros(1, 1, 4)  # (num_layers * n_directions) x batch x hidden_size

	try:
		_ = model_pytorch(x, h)  # checking that model inference is OK
	except (Exception, RuntimeError) as e:
		print(e)
		print(' ===== Unsuccessfull model inference run, exiting ===== ')
		return
	finally:
		print(' ===== Step 0 finished ===== ')

	# step 1: convert to ONNX

	onnx_proto_output = "temp.onnx"

	try:
		torch.onnx.export(model_pytorch, (x, h), onnx_proto_output, export_params=True, verbose=True)
	except (Exception, RuntimeError) as e:
		print(e)
		print(' ===== Unsuccessfull pytorch->ONNX run, exiting ===== ')
		return
	finally:
		print(' ===== Step 1 finished ===== ')

	# step 2: check ONNX model using Caffe2-ONNX backend
	
	model_onnx = onnx.load(onnx_proto_output)
	print(onnx.checker.check_model(model_onnx))
	print(onnx.helper.printable_graph(model_onnx.graph))
	
	x_ = x.numpy()
	h_ = h.numpy()

	try:
		outputs = backend.run_model(model_onnx, (x_, h_))
		print(outputs)
	except (Exception, RuntimeError) as e:
		print(e)
		print(' ===== Unsuccessfull Caffe2.onnx run, exiting ===== ')
		return
	finally:
		print(' ===== Step 2 finished ===== ')
	
	# step 3: save model to caffe2 format

	try:
		input_size_net, predict_net = Caffe2Backend.onnx_graph_to_caffe2_net(model_onnx)
		with open('init_net.pb', "wb") as f:
			f.write(init_net.SerializeToString())
		with open('predict_net.pb', "wb") as f:
			f.write(predict_net.SerializeToString())
	except (Exception, RuntimeError) as e:
		print(e)
		print(' ===== Unsuccessfull Caffe2 save run, exiting ===== ')
		return
	finally:
		print(' ===== Step 3 finished')


if __name__ == '__main__':
	main()

It fails on line 57 at command run_model(...) of step 2:

ONNX FATAL: list index out of range

My small research shows that the error is around the line 425 of this file:

424 if x.name == W:
425     input_size = x.type.tensor_type.shape.dim[2].dim_value
426     break

It turns out that in this case the matrix which we’re considering has only 2 dims (12x8):

name: "10"
type {
  tensor_type {
    elem_type: FLOAT
    shape {
      dim {
        dim_value: 12
      }
      dim {
        dim_value: 8
      }
    }
  }
}

I wonder if there’s versions mismatch between the onnx/caffe2 ways of handling the models?
If so, what’s the good way of fixing that?

I use pytorch 0.4.0, onnx 1.2.1 (latest from source), caffe2 (latest from source).

Thanks!

lysuhin · July 30, 2018, 11:09am

Ok, with no answers provided, it seems I figured out a workaround that suits my particular case.
I hardcoded the necessary size @ line 410 and commented out the following Reshape/Squeeze block (i.e. reverting the commit)

Kandarp_Makwana · August 1, 2018, 2:39am

I hit into same issue and after looking at pytorch onnx exporter code and caffe2 onnx importer code ,realized onnx importer is expecting lstm parameters in bit different format and realized it was version conflict.

Updating to pytorch 0.4.1(they have released it 5 days back)resolved my issue. It might be worth trying for you[quote=“lysuhin, post:1, topic:21826”]

[/quote]