Do linear layer after GRU saved the sequence output order?

I’m dealing with the following senario:

  • My input has the shape of: [batch_size, input_sequence_length, input_features]

    input_sequence_length = 10

    input_features = 3

  • My output has the shape of: [batch_size, output_sequence_length]

    output_sequence_length = 5

i.e: for each time slot of 10 units (each slot with 3 features) I need to predict the next 5 slots values.

I built the following model:

import torch
import torch.nn as nn
import torchinfo

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.GRU = nn.GRU(input_size=3, hidden_size=32, num_layers=2, batch_first=True)
        self.fc  = nn.Linear(32, 5)
    def forward(self, input_series):
        output, h = self.GRU(input_series)                
        output    = output[:,  -1, :]       # get last state                
        output    = self.fc(output) 
        output    = output.view(-1, 5, 1)   # reorginize output        
        return output
torchinfo.summary(MyModel(), (512, 10, 3))  

Layer (type:depth-idx)                   Output Shape              Param #
MyModel                                  [512, 5, 1]               --
├─GRU: 1-1                               [512, 10, 32]             9,888
├─Linear: 1-2                            [512, 5]                  165

I’m getting good results (very small MSE loss, and the predictions looks good),

but I’m not sure if the model output (5 sequence values) are really ordered by the model ?
i.e the second output based on the first output and the third output based on the second output …

I know that the GRU output based on the learned sequence history.
But I’m also used linear layer, so is the output (after the linear layer) still sorted by time ?

No, I don’t think the output is “sorted by time” since you have explicitly removed the time axis by indexing the last time step:

output    = output[:,  -1, :]       # get last state  

output now only contains the feature tensor of the last time step which is then passed to the linear layer.
Due to the recurrent nature of GRU the last time step could contain information from the previous steps, but the actual tensor does not have a temporal dependency anymore.

So although I’m getting good results, it seems it works with no sense and it better to chage the model ?

No, I don’t think the model does anything wrong as the last state could still contain enough information from the previous time steps.

Can I count on the model outputs (tensor of shape 5) to be ordered ?
I.e the output[4] based on output[3] and output[3] based on output[2]… output[1] based on output[0] ?

No, that’s not how a linear layer works.
Right now, you are using the 32 features of the last time step and pass them to the linear layer. The linear layer will apply a weight matrix (and bias) to this input and will “map” the 32 features to 5. The 5 output features are not ordered or sorted in any way but represent what you’ve defined as the model output, e.g. class logits.

1 Like