How to train GRU model with correct forward function?

there are N sequences of data with the shape of (N sequences, length of each sequence, features of each row). each row is corresponded to a class.
I want to use the GRU model to predict the class for each row in each sequence (1, length, features).

But I am confused with updating parameters. Two ways thought:

  1. make predictions of the whole of a sequence (1, length, features). calculate the loss together (the length of the predictions = the length of a sequence). then update the parameters.

  2. make prediction of each class of each row in a sequence one by one. calculate the loss for each row and update.

I found that after training with (1), the model will makes the same predictions regardless of input data (even random data)
(2) is better, but still the loss decreased a little.

what is the difference with the two methods?

model code:

#...
self.rnn = nn.GRU(100, 200, 1)
self.fc = nn.Linear(1, 7)  # there are 7 classes
#...

for (1):

def forward(self, seqs):
     h_n = torch.zeros(1, 200, dtype=torch.float).cuda()
     outputs = []
     for seq in seqs:
          output, h_n = self.rnn(seq, h_n)
          output = self.fc(output)
          output = F.softmax(output)
          outputs += [output]
     return torch.cat(outputs, dim = 1)

#... training
outputs = model(seqs)
loss = loss_fn(outputs, targets) # loss_fn = nn.CrossEntropyLoss()
loss.backward()
optimizer.zero_grad()
optimizer.step()

for (2):

def forward(self, seq, h_n):
     output, h_n = self.rnn(seq, h_n)
     output = self.fc(output)
     return F.softmax(output), h_n

#... training
h_n = torch.zeros(1, 200, dtype=torch.float).cuda()
for seq in seqs:
     h_n = h_n.detach()
     output, h_n = model(seq, h_n)
     loss = loss_fn(output, target) # loss_fn = nn.CrossEntropyLoss()
     loss.backward()
     optimizer.zero_grad()
     optimizer.step()