Why should I reshape a tensor before call a loss function?

so here is the code,

    for i in range(len(target)):
        # GRU output
        output, hidden = model(target[i], hidden)

        # reshape a tensor, but why?
        (seq, bat, inp) = output.size()
        output = output.reshape(seq, inp, bat)

        # calculate loss
        loss = criterion(output, label[i].argmax(2)).to(device)

        # backpropagation and change parameters
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()

So, the output from GRU model’s forward propagation is a tensor
shape of (seq_len, batch_size, input_size).
I thought the labelled data of this input(target in loss function) should be a tensor shape of
(seq_len, batch_size).
In this code, variable label is a one hot vector , shape of (seq_len, batch_size, input_size) so i called argmax(dim=2) to make it acceptable for loss function becuz it don take a one hot vector.

But without reshaping an output tensor, this gives me an error :
ValueError : Expected target size (seq_len, input_size),
got torch.size([seq_len, batch_size]).

I actually can solve this problem by reshaping the output tensor
by (seq_len, input_size, batch_size)
but why should i do that?
isnt it natural doing this without reshaping???