so here is the code,

```
for i in range(len(target)):
# GRU output
output, hidden = model(target[i], hidden)
# reshape a tensor, but why?
(seq, bat, inp) = output.size()
output = output.reshape(seq, inp, bat)
# calculate loss
loss = criterion(output, label[i].argmax(2)).to(device)
# backpropagation and change parameters
optimizer.zero_grad()
loss.backward(retain_graph=True)
optimizer.step()
```

So, the output from GRU modelâ€™s forward propagation is a tensor

shape of (seq_len, batch_size, input_size).

I thought the labelled data of this input(target in loss function) should be a tensor shape of

(seq_len, batch_size).

In this code, variable label is a one hot vector , shape of (seq_len, batch_size, input_size) so i called argmax(dim=2) to make it acceptable for loss function becuz it don take a one hot vector.

But without reshaping an output tensor, this gives me an error :

ValueError : Expected target size (seq_len, input_size),

got torch.size([seq_len, batch_size]).

I actually can solve this problem by reshaping the output tensor

by (seq_len, input_size, batch_size)

but why should i do that?

isnt it natural doing this without reshaping???