Error Description
I encountered a model that, when run in onnxruntime
and Glow
respectively, their outputs are distinct.
The output of onnxruntime
is
[-7.4040246e-01 -1.0558732e+00 -5.1735401e-01 5.7158279e+00
-6.1898971e-01 4.8943150e-01 -5.3290629e-01 -7.5109071e-01
-1.0089742e+00 -9.7962457e-01 -2.0205033e-01 -5.6859505e-01
-8.3097053e-01 -6.0609245e-01 -9.2153203e-01 -8.8401639e-01
-8.7050164e-01 -9.2254686e-01 6.3326435e+00 -5.2634275e-01
-2.1332520e-01 -3.9393574e-02 -8.1680071e-01 -7.4010009e-01
-9.0249908e-01 -9.0031087e-01 -8.4391761e-01 -8.9972514e-01
5.8119040e+00 -4.5583522e-01 5.3716774e+00 -7.6865733e-01
-3.2050097e-01 -6.3881731e-01 -7.8147006e-01 -8.8674474e-01
-8.7157512e-01 -7.5937271e-01 1.0185689e-03 -3.4555185e-01]
, whereas the output of Glow
is
[-0.742133, -1.054049, -0.511033, 5.701681, -0.617301, 0.488157, -0.531513, -0.749289, -1.006998, -0.977479, -0.201742, -0.567398, -0.830031, -0.606550, -0.920508, -0.882669, -0.869515, -0.921491, 6.325564, -0.525662, -0.213413, -0.039500, -0.816155, -0.739523, -0.901777, -0.899409, -0.843245, -0.898985, 5.806691, -0.454686, 5.361451, -0.770803, -0.321736, -0.639595, -0.781732, -0.886845, -0.871701, -0.757481, 0.010961, -0.342509]
Cause Analysis
According to the model definition, Node Mul_1024
should take the output of node Sub_1023
as input.
If you use -dump-ir-after-all-passes
option to view IR generated by Glow
, you will see that after the ShareBuffers
optimization pass
...
34 %Sub_1023__1 = elementsub @out %Sub_1023__1_res, @in %Sub_1023__1_res, @in %A1227_transposed
...
98 %Sub_2268__1 = elementsub @out %Sub_1023__1_res, @in %Sub_1023__1_res, @in %A1227_transposed
...
...
...
179 %Sub_1023__1_res__3 = tensorview @in %Sub_1023__1_res { Ty: float<4 x 64 x 1 x 1>, Offsets:[0, 0, 0, 0]} // Users: @in 180
180 %Mul_1024 = elementmul @out %Mul_1024_res, @in %Sub_1023__1_res__3, @in %Add_3784_res
...
, node Sub_2268
writes its output to the buffer that is supposed to store the output of node Sub_1023
. So in practice, in line 180
, the input to node Mul_1024
is the output of node Sub_2268
, which is the cause of the wrong output value.
How To Reproduce
You can download the package at rep.zip - Google Drive. After unzipping it, you can follow the steps in README.md
to reproduce the results.
System Information:
Glow version: built from GitHub source commit 07a82bd9fe97dfd2e8ea0f4742dce5ce86177c2b
onnxruntime version: 1.7.0
onnx version: 1.9.0
Operation system: Ubuntu 18.04LTS
CPU: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz 16 cores
BTW, we also report a similar issue (which seems also due to IR translation at [BUG] Error in IR Parsing