Padding has an effect on hidden. However, in many cases (e.g., the length of input are quite close) the effect is quite small.
If a batch consists of a very long sequence and some very short sequences, then the short ones have to be padded with many zeros. Then, performing RNN on the short sequences which are padded with zeros will be equal to performing RNN on zero sequences, right?