I was found that the optimizer changes the value in embedding layer which is not used in the computational graph.
The amount of value changed is very small, e.g. 1e-10, but it could be serious when training with 1M iteration.
This comes from the operation in sparse tensor.
Is there anyone who encounter this situation? If it is, how to solve it?