-
because otherwise it will reinitialize hidden layer to zeros (which means it does not remember across time steps). It works but performance wont be as good.
-
Yes this is okay to do but see if you have to adjust learning rate.
8 Likes