I am trying to optimize a CPU model using oneDNN, following this tutorial: https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html#use-onednn-graph-with-torchscript-for-inference
The situation is that I observe a huge difference in performance by inference between the model generated for CPU with PyTorch and the model generated directly with the oneDNN library version 1.6 (300 microseconds in the first case and less than 20 microseconds in the second).
The procedure I am following with the model in Python to prepare it for CPU is this:
model = model.to(“cpu”)
model = torch.jit.trace(model, torch.rand(shape))
model = torch.jit.freeze(model _scripted)
After that, in C++, I use it as follows:
module = torch::jit::load(path);
output = module.forward(tensor_inputs).toTensor();
Not getting the expected results, I have also tried with the library intel-extension-for-pytorch from Intel, including in the previous code this line before “torch.jit.trace”:
model = ipex.optimize(model, dtype=torch.float32)
The inference time is, also, around 300 microseconds. Also, performing the command " ldd" on the compiled executable file, and comparing with this tutorial, I notice that I am missing the file “libdnnl_graph.so”, which I am not clear where it should come from because I cannot find it in any of the aforementioned libraries.
Any idea or suggestion of what could be happening? Does the optimization process for CPU seem correct to you? Is this huge difference in performance normal?