Backpropagating through different NN in ModuleList

Zurisen · April 8, 2021, 4:22pm

Hello there,
I am trying to distribute operations through different neural networks in the same agent. For that purpose I have created a ModuleList of “n_rm_states” objects. My goal is to input the state tensor to feed forward one of the neural networks in the ModuleList depending on what rm_state takes as input too, thats why the forward function has two inputs, the state and the rm_state. So basically, the so called rm_state is just used as an index to select what neural network in the ModuleList to use. This my network code:

class DeepQNetwork(nn.Module):
    def __init__(self, lr, n_actions, name, input_dims, n_rm_states,chkpt_dir):
        super(DeepQNetwork, self).__init__()
        self.checkpoint_dir = chkpt_dir
        self.checkpoint_file = os.path.join(self.checkpoint_dir, name)
        self.n_actions = n_actions
        self.n_rm_states = n_rm_states

        # each self.rm_network index will correspond to a separate neural network
        # for learing an specific "n" rm state policy
        self.rm_network = []
        for i in range(n_rm_states):
            self.rm_network.append(
                    nn.Sequential(
                        nn.Linear(np.prod(input_dims), 256),nn.ReLU(),
                        nn.Linear(256, n_actions)
                        )
                    )
        # nn.ModuleList makes Pytorch read the created Python list as a
        # nn.Module list of objects
        self.rm_network = nn.ModuleList(self.rm_network)

        self.optimizer = optim.RMSprop(self.rm_network.parameters(), lr=lr)

        self.loss = nn.MSELoss()

       # device nn.Module function already defined in Agent
        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')
        self.to(self.device)

    def forward(self, state, rm_state):

        # Flatten the observation [BS, 7, 7, 3] --> [BS, 147]
        flat_state = state.view(state.size()[0], -1).to(self.device)

        # Initialize the actions tensor [BS, n_actions=7]
        actions = T.zeros(state.size()[0], self.n_actions, requires_grad=False).to(self.device)
        for i in range(self.n_rm_states):
            # Loop over the rm_states and feed forward each observation to its
            # correspondant rmstate neural network
            try:
                rm_index = T.where(rm_state == i)[0]
                actions[rm_index] = self.rm_network[i](flat_state[rm_index])
            except:
                pass

        return actions

So far, it works as intended, no errors pop up but something weird happens. When I use only one network to train my agent it learns correctly, the problem appears when I try to add these few several neural networks. It looks like it does not backpropagate correctly. I have no idea why, because no errors appear to possibly debug it, the only consequence I see is the lack of progress in the agent learning process.

Here is the code I use for backpropagate:

    def learn(self):
        if self.memory.mem_cntr < self.batch_size:
            return

        self.q_eval.optimizer.zero_grad()

        self.replace_target_network()

        states, u1s, actions, rewards, states_, u2s, dones = self.sample_memory()
        indices = np.arange(self.batch_size)

        q_pred = self.q_eval.forward(states, u1s)[indices, actions]
        q_next = self.q_next.forward(states_, u2s).max(dim=1)[0]

        q_next[dones] = 0.0
        q_target = rewards + self.gamma*q_next

        loss = self.q_eval.loss(q_target, q_pred).to(self.device)
        loss.backward()

        self.q_eval.optimizer.step()
        self.learn_step_counter += 1

        self.decrement_epsilon()

If you are familiar with DQN (Deep Q- Learning) in Reinforcement Learning you will find these code very similar to it. It is because it is DQN but instead of using one neural network to learn an overall policy I distribute several policies (rm states) between different neural networks (thats why I use a ModuleList of neural networks) and I apply Q learning on them.

I will appreciate your answers,
Thanks beforehand.