Thoughts about RNN training with different batch_size

After reading RNN source code in Pytorch and some blogs regarding with RNN, I want to verify my thoughts about RNN training. That is, for example, if we set batch_size from 1 to 4, the training process is totally different.

Here is the explanation about nn.RNNN() in Pytorch documentation.

h_n of shape (num_layers * num_directions, batch, hidden_size)

if batch_size = 1, then we only get one init hidden state (typically is a tensor with all zeros). While if batch_size = 4, we get four init hidden states, which means the input hidden state of RNN training in the fourth word now is different from the input hidden state of previous training process.

Here is my reason. Considering the former one, the input hidden state of fourth word in this process (batch_size=1) contains the information of first three words, while the other one is totally zero. Therefore, two training processes are totally different.

Can anybody tells me whether my thoughts are right or not?