so here is the code,
for i in range(len(target)):
# GRU output
output, hidden = model(target[i], hidden)
# reshape a tensor, but why?
(seq, bat, inp) = output.size()
output = output.reshape(seq, inp, bat)
# calculate loss
loss = criterion(output, label[i].argmax(2)).to(device)
# backpropagation and change parameters
optimizer.zero_grad()
loss.backward(retain_graph=True)
optimizer.step()
So, the output from GRU model’s forward propagation is a tensor
shape of (seq_len, batch_size, input_size).
I thought the labelled data of this input(target in loss function) should be a tensor shape of
(seq_len, batch_size).
In this code, variable label is a one hot vector , shape of (seq_len, batch_size, input_size) so i called argmax(dim=2) to make it acceptable for loss function becuz it don take a one hot vector.
But without reshaping an output tensor, this gives me an error :
ValueError : Expected target size (seq_len, input_size),
got torch.size([seq_len, batch_size]).
I actually can solve this problem by reshaping the output tensor
by (seq_len, input_size, batch_size)
but why should i do that?
isnt it natural doing this without reshaping???