How to train GRU model with correct forward function?

there are N sequences of data with the shape of (N sequences, length of each sequence, features of each row). each row is corresponded to a class.
I want to use the GRU model to predict the class for each row in each sequence (1, length, features).

But I am confused with updating parameters. Two ways thought:

1. make predictions of the whole of a sequence (1, length, features). calculate the loss together (the length of the predictions = the length of a sequence). then update the parameters.

2. make prediction of each class of each row in a sequence one by one. calculate the loss for each row and update.

I found that after training with (1), the model will makes the same predictions regardless of input data (even random data)
(2) is better, but still the loss decreased a little.

what is the difference with the two methods?

model code:

``````#...
self.rnn = nn.GRU(100, 200, 1)
self.fc = nn.Linear(1, 7)  # there are 7 classes
#...
``````

for (1):

``````def forward(self, seqs):
h_n = torch.zeros(1, 200, dtype=torch.float).cuda()
outputs = []
for seq in seqs:
output, h_n = self.rnn(seq, h_n)
output = self.fc(output)
output = F.softmax(output)
outputs += [output]

#... training
outputs = model(seqs)
loss = loss_fn(outputs, targets) # loss_fn = nn.CrossEntropyLoss()
loss.backward()
optimizer.step()
``````

for (2):

``````def forward(self, seq, h_n):
output, h_n = self.rnn(seq, h_n)
output = self.fc(output)
return F.softmax(output), h_n

#... training
h_n = torch.zeros(1, 200, dtype=torch.float).cuda()
for seq in seqs:
h_n = h_n.detach()
output, h_n = model(seq, h_n)
loss = loss_fn(output, target) # loss_fn = nn.CrossEntropyLoss()
loss.backward()