Recently I compared two models for a DQN on CartPole-v0 environment. One of them is a multilayer perceptron with 3 layer and the other is an RNN built up from an LSTM and 1 fully connected layer. I have an experience replay buffer of size 200000 and the training doesn’t start until it is filled up. Although MLP has solved the problem under a reasonable amount of training steps (this means achieve a mean reward of 195 for the last 100 episodes), the RNN model could not converge as quickly and its maximum mean reward did not even reached 195 too!
I have already tried to increase batch size, add more neurons to the LSTM’S hidden state, increase the RNN’S sequence length and making the fully connected layer more complex - but every attempt failed as I saw enormous fluctuations in mean reward so the model harly converged at all. May these are the sings of early overfitting?
class DQN(nn.Module):
def __init__(self, n_input, output_size, n_hidden, n_layers, dropout=0.3):
super(DQN, self).__init__()
self.n_layers = n_layers
self.n_hidden = n_hidden
self.lstm = nn.LSTM(input_size=n_input,
hidden_size=n_hidden,
num_layers=n_layers,
dropout=dropout,
batch_first=True)
self.dropout= nn.Dropout(dropout)
self.fully_connected = nn.Linear(n_hidden, output_size)
def forward(self, x, hidden_parameters):
batch_size = x.size(0)
output, hidden_state = self.lstm(x.float(), hidden_parameters)
seq_length = output.shape[1]
output1 = output.contiguous().view(-1, self.n_hidden)
output2 = self.dropout(output1)
output3 = self.fully_connected(output2)
new = output3.view(batch_size, seq_length, -1)
new = new[:, -1]
return new.float(), hidden_state
def init_hidden(self, batch_size, device):
weight = next(self.parameters()).data
hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().to(device),
weight.new(self.n_layers, batch_size, self.n_hidden).zero_().to(device))
return hidden
Contrarly to what I expected the simpler model gave much better result that the other; even though RNN’s supposed to be better in processing time series data.
Can anybody tell me what’s the reason for this?
Also, I have to state that I applied no feature engineering and both DQN’s worked with raw data. May using normalized features could RNN outperform the MLP? (I mean feeding both model with normalized data)
Is there anything you can recommend me to improve training efficiency on RNN’s to achieve the best results?