Stateless RNN vs Statefull

Does it make sense, that Stateless RNN had a better performance than a Stateful RN?

  hidden = None
    y_pred = []
    for x_i in x.tolist():

        x_i = np.array([x_i])[:, np.newaxis]

        hidden = None # Is commented in statefull case.

        x_tensor = torch.Tensor(x_i).unsqueeze(0)
        prediction, hidden = rnn(x_tensor, hidden)
        hidden =
        prediction = prediction.detach().numpy().flatten()

Could you explain what exactly you mean by a stateless RNN and what network topology you are using? Is rnn in your case a cell or a complete RNN? Do you pass a whole sequence or only one timestep input?

If the initial hidden state is not passed (None) internally a zero vector is used as the first hidden state. If conditioning on the initial hidden state is not beneficial it is possible that the ‘performace’ of the model is better than using an additional context vector.

I suppose it’s a complete RNN.

By Stateless, I assume that in evaluation (prediction mode) I provide hidden = None for each iteration instead of preserving it from output.

Code for RNN class:

RNN Class code
class RNN(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(RNN, self).__init__()

        # define an RNN with specified parameters
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
        # last, fully-connected layer
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x, hidden):
        batch_size = x.size(0)

        r_out, hidden = self.rnn(x, hidden)
        r_out = r_out.view(-1, self.hidden_dim)  
        output = self.fc(r_out)
        return output, hidden

Somehow I did an experiments again and I didn’t succeed to reproduce it.

As expected, stageful had a better results.