I’m not sure if this issue has been posted before, but here it goes.
I’ve been training networks on two different machines with same running parameters, but getting very different results. The only different is that one machine I use version 1.9.0 and in the other 1.13.0.
I’m wonder what could be so different between these two releases that is producing such effect. Is the default weight initialization the same on these two versions?