presumably because the elements of the sequence are fed to the GRU one at a time which is necessary given that at each timestep you want to feed the output of the previous timestep back in to the GRU.
In this case we need seq_len=1 because we need to use the output of the previous timestep as the input to the next. However when the input timesteps are available in advance you could feed it the entire sequence in one go.