there are N sequences of data with the shape of (N sequences, length of each sequence, features of each row). each row is corresponded to a class.
I want to use the GRU model to predict the class for each row in each sequence (1, length, features).
But I am confused with updating parameters. Two ways thought:
-
make predictions of the whole of a sequence (1, length, features). calculate the loss together (the length of the predictions = the length of a sequence). then update the parameters.
-
make prediction of each class of each row in a sequence one by one. calculate the loss for each row and update.
I found that after training with (1), the model will makes the same predictions regardless of input data (even random data)
(2) is better, but still the loss decreased a little.
what is the difference with the two methods?
model code:
#...
self.rnn = nn.GRU(100, 200, 1)
self.fc = nn.Linear(1, 7) # there are 7 classes
#...
for (1):
def forward(self, seqs):
h_n = torch.zeros(1, 200, dtype=torch.float).cuda()
outputs = []
for seq in seqs:
output, h_n = self.rnn(seq, h_n)
output = self.fc(output)
output = F.softmax(output)
outputs += [output]
return torch.cat(outputs, dim = 1)
#... training
outputs = model(seqs)
loss = loss_fn(outputs, targets) # loss_fn = nn.CrossEntropyLoss()
loss.backward()
optimizer.zero_grad()
optimizer.step()
for (2):
def forward(self, seq, h_n):
output, h_n = self.rnn(seq, h_n)
output = self.fc(output)
return F.softmax(output), h_n
#... training
h_n = torch.zeros(1, 200, dtype=torch.float).cuda()
for seq in seqs:
h_n = h_n.detach()
output, h_n = model(seq, h_n)
loss = loss_fn(output, target) # loss_fn = nn.CrossEntropyLoss()
loss.backward()
optimizer.zero_grad()
optimizer.step()