Difference between torch and Pytorch in terms of gradients and initialization

Hi everyone, I’m currently trying to running a Pytorch code inorder to re-implement its Lua Torch version. My question is that if all the operations are the same, (networks structure, learning rate, dropout, etc.) will there still be differences between Pytorch version and Torch version that may cause performance gap? for example, there are different default initialization values or gradient update values between the two, thanks.

anyone knows the answer?