RNN: for many to many classification task

You are using a separate Linear layer with different parameters for each timestep.

Here is what I would do

class RNNS2S(nn.Module):
    def __init__(self, ...):
        self.fc = nn.Linear(in_features=self.hidden_dimensions, out_features=self.num_classes)

    def forward(self, X):
        output_gru, h_n = self.gru(X, h_0)
        # output_gru has shape (batch_size, seq_len, hidden_dimensions)
        # nn.Linear operates on the last dimension of its input
        # i.e. for each slice [i, j, :] of gru_output it produces a vector of size num_classes
        fc_output = self.fc(output_gru)
        # fc_output will be batch_size*seq_len*num_classes
        return fc_output 

Now, I can produce input_data of shape (batches, timesteps, 4) along with targets of shape (batches, timesteps, 1) and if I feed input_data into the model then I will get output of shape (batches, timesteps, 1), i.e. one prediction for each timestep of each sample in the batch.

Then I can use any of the standard loss functions in PyTorch such as CrossEntropyLoss to compare fc_output to my targets.