You are using a separate Linear layer with different parameters for each timestep.
Here is what I would do
class RNNS2S(nn.Module):
def __init__(self, ...):
...
self.fc = nn.Linear(in_features=self.hidden_dimensions, out_features=self.num_classes)
def forward(self, X):
...
output_gru, h_n = self.gru(X, h_0)
# output_gru has shape (batch_size, seq_len, hidden_dimensions)
# nn.Linear operates on the last dimension of its input
# i.e. for each slice [i, j, :] of gru_output it produces a vector of size num_classes
fc_output = self.fc(output_gru)
# fc_output will be batch_size*seq_len*num_classes
return fc_output
Now, I can produce input_data
of shape (batches, timesteps, 4)
along with targets
of shape (batches, timesteps, 1)
and if I feed input_data
into the model then I will get output of shape (batches, timesteps, 1)
, i.e. one prediction for each timestep of each sample in the batch.
Then I can use any of the standard loss functions in PyTorch such as CrossEntropyLoss to compare fc_output
to my targets
.