I tried 3 ways to run a torch.nn.GRU model on a cpu. The model is like
model = nn.GRU(512, 256, batch_first=True, bidirectional=True)
- run with pytorch; 2. convert to TorchScript and run with C++; 3 convert to ONNX and run with python
Each test was run 100 times to get an average number. The result is TorchScript with C++ is much slower than the others. Pytorch and ONNX only take about 40ms to run, but c++ takes about 120ms!
All these 3 ways use same mkldnn backend, such big performance seems not expected. Does any one know why and how to improve the performance in C++?
Thanks very mych!