My forward function looks like:
def forward(self, dict_index, features, prev_hidden, seq_sizes, original_index):
i2e = self.embedding(dict_index)
data = torch.cat((i2e, features), 2)
packed = pack_padded_sequence(data, list(seq_sizes.data), batch_first=True)
output, _ = self.rnn(packed, prev_hidden)
output, _ = pad_packed_sequence(output, batch_first=True)
# get the last time step for each sequence
idx = (seq_sizes - 1).view(-1, 1).expand(output.size(0), output.size(2)).unsqueeze(1)
decoded = output.gather(1, idx).squeeze()
# restore the sorting
decoded[original_index] = decoded
return decoded
and then I compare the decoding using the cosine embedding. Will pytorch use the correct gradients on backward ? Do I need to modify something ? Because I’m changing the order of the data in the last step of the forward, so do I need to “reorder” the loss values ?