How much deep a Neural Network Required for 12 inputs of ranging from -5000 to 5000 in a3c Reinforcement Learning

I am trying to use A3C with LSTM for an environment where states has 12 inputs ranging from -5000 to 5000. I am using an LSTM layer of size 12 and then 2 fully connected hidden layers of size 256, then 1 fc for 3 action dim and 1 fc for 1 value function. The reward is in range (-1,1).

However during initial training I am unable to get good results.

My question is- Is this Neural Network good enough for this kind of environment? Or this bad performance initially is due to lstm?

Below is the code for Actor Critic

class ActorCritic(torch.nn.Module):

    def __init__(self, params):
        super(ActorCritic, self).__init__()

        self.state_dim = params.state_dim
        self.action_space = params.action_dim
        self.hidden_size = params.hidden_size
        state_dim = params.state_dim
        self.lstm = nn.LSTMCell(state_dim, state_dim)
        lst = [state_dim]
        for i in range(params.layers):
        self.hidden = nn.ModuleList()
        for k in range(len(lst)-1):
            self.hidden.append(nn.Linear(lst[k], lst[k+1]))
        for layer in self.hidden:

        self.critic_linear = nn.Linear(params.hidden_size, 1)
        self.actor_linear = nn.Linear(params.hidden_size, self.action_space)

    def forward(self, inputs):
        inputs, (hx, cx) = inputs
        inputs = inputs.reshape(1,-1)
        hx, cx = self.lstm(inputs, (hx, cx))
        x = hx
        for layer in self.hidden:
            x = torch.tanh(layer(x))
        return self.critic_linear(x), self.actor_linear(x), (hx, cx)

class Params():
    def __init__(self): = 0.0001
        self.gamma = 0.99
        self.tau = 1.
        self.num_processes = os.cpu_count()
        self.state_dim = 12
        self.action_dim = 3
        self.hidden_size = 256
        self.layers = 2
        self.lstm_layers = 1
        self.lstm_size = self.state_dim
        self.num_steps = 20

The LSTM network is a type of Recurrent Neural network used for detecting and recognizing sequencial patterns for some given time steps or time series data.

So except the rewards of u RL environment are gotten in a sequencial manner of exploration / exploitation then I suggest u use a different architecture

Anyways even without all these details I’ll still suggest u change ur architecture just to check if it’s really from the network or from sth else u did or failed to do.

Just saying :man_shrugging::upside_down_face:

Thanks for the reply.

Actually I want my model to remember some information about the past and therefore I am using LSTM.
However I am not sure if this precision of states can be handled by the neural network.

Yes it can be handled by a Neural network
I don’t really see anything other practical way of doing this without a neural network except for some strange reasons, u have access to petabytes of ram and a processor that can process petabytes of data in seconds then the standard Q-learning algorithm will suffice, but u and I know that’s not really possible tho lol😅

So yah

Then again can u give me brief run down of this environment u r using?

Also when u said:

What exactly is it u want the model to remember?
Is it the previous states and actions it took when it was there?

yes, I want it to remember the state information from the past.

Well u can still use an LSTM with some dense layers as u initially did
There’s really no rule of thumb here or sth like that.


I’m kinda curious tho. How would it affect the decision it takes next?
Coz from my knowledge given a state the action selected is the one with the maximum approximate value of Q outputted by the network

Or is there sth else u wish to do with this?