Friends help me! Recently I was running a paper with open source model code, and after some structural changes, I changed the xavier init method to kaiming init method, and the final result of the model improved by 3 points(comparied with xavier init method), is this possible? Or is it possible that there is a problem that causes the final result to be wrong.
Initial value is just the initial.
Those methods are proposed to prevent the gradient explosion/vanishing and something.
I think there’s no critical difference between them.
No problem:)
But why is the final result so different, without changing the other structure of the model.
- Train more epochs
- Do many trainings to validate stable result.
- Some tasks show the different results depend on the initializer
Would help.
Thanks a lot! I will try this methods