It seems that the final value of momentum is (learning_rate * momentum) in SGD; which is not according to the standard SGD equations.
It seems that the final value of momentum is (learning_rate * momentum) in SGD; which is not according to the standard SGD equations.