I have fixed this problem. This seems to be the same problem with Gradient computed on CPU but nor computed on GPU?.
Although I still don’t quite get the reason of the solution given in the discussion above.
Besides, in my original test cases, I seem to use the first given solution, but it turned out to fail.