Currently I am trying to convert a model from Caffe to PyTorch and hope that I can train with the same parameters and get the same (within 1% error) result. However, right now, I am unable to, same parameters will give vastly different results. I am not sure if there is any particular backend implementation difference of the two frameworks that I should take note of so that I can replicate the result.
Or is it not possible? That there is so much difference that adds up that I must use different parameters?
Currently I only know the sgd is slightly different. I have modified it and I still couldn’t get same result with the same parameters.