Purpose of hidden state in beamsearch

Perhaps, I am still wrapping my head around beam search.
I completely understand how it is described here.
What I don’t understand is why do we need to keep track of hidden states as oppose to keeping track of the top k predictions. What am I missing? The pseudo code is below. I’ve also been looking at this example.

for each element in the test-set
    calculate initial k (decoder-encoder step)
    for range(timesteps-1)
        for each prev k
            get hidden state
            obtain its best k
            save hidden state
        find new k from k*k possible ones
        ##update hypotheses based on new found k
        for element in k 
            copy hidden state
            change hypotheses if necessary
            append new k to hypotheses

Is it because when I provide k words as input to the next timestep so that it can predict the next word in the sentence, thus, I need to pass the hidden as well in order to predict the next word?