How to initialize the hidden units of an LSTM?

If I understand correctly it is common practice during training to initialize the hidden units of the LSTM for each sequence with a new set of random values. Why is this better then using the same random values every time?

In particular, how should this get handled at application time? Could different ways to initialize the hidden units at application time (no training) lead to different predictions? According to the formula of the LSTM it looks like that would be possible. So why not use the same set of random values for all runs through the LSTM, training and application?

Would using the default (using a 0 vector) be harmful in any way, and why?