I am trying to distribute operations through different neural networks in the same agent. For that purpose I have created a ModuleList of “n_rm_states” objects. My goal is to input the state tensor to feed forward one of the neural networks in the ModuleList depending on what rm_state takes as input too, thats why the forward function has two inputs, the state and the rm_state. So basically, the so called rm_state is just used as an index to select what neural network in the ModuleList to use. This my network code:
class DeepQNetwork(nn.Module): def __init__(self, lr, n_actions, name, input_dims, n_rm_states,chkpt_dir): super(DeepQNetwork, self).__init__() self.checkpoint_dir = chkpt_dir self.checkpoint_file = os.path.join(self.checkpoint_dir, name) self.n_actions = n_actions self.n_rm_states = n_rm_states # each self.rm_network index will correspond to a separate neural network # for learing an specific "n" rm state policy self.rm_network =  for i in range(n_rm_states): self.rm_network.append( nn.Sequential( nn.Linear(np.prod(input_dims), 256),nn.ReLU(), nn.Linear(256, n_actions) ) ) # nn.ModuleList makes Pytorch read the created Python list as a # nn.Module list of objects self.rm_network = nn.ModuleList(self.rm_network) self.optimizer = optim.RMSprop(self.rm_network.parameters(), lr=lr) self.loss = nn.MSELoss() # device nn.Module function already defined in Agent self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu') self.to(self.device) def forward(self, state, rm_state): # Flatten the observation [BS, 7, 7, 3] --> [BS, 147] flat_state = state.view(state.size(), -1).to(self.device) # Initialize the actions tensor [BS, n_actions=7] actions = T.zeros(state.size(), self.n_actions, requires_grad=False).to(self.device) for i in range(self.n_rm_states): # Loop over the rm_states and feed forward each observation to its # correspondant rmstate neural network try: rm_index = T.where(rm_state == i) actions[rm_index] = self.rm_network[i](flat_state[rm_index]) except: pass return actions
So far, it works as intended, no errors pop up but something weird happens. When I use only one network to train my agent it learns correctly, the problem appears when I try to add these few several neural networks. It looks like it does not backpropagate correctly. I have no idea why, because no errors appear to possibly debug it, the only consequence I see is the lack of progress in the agent learning process.
Here is the code I use for backpropagate:
def learn(self): if self.memory.mem_cntr < self.batch_size: return self.q_eval.optimizer.zero_grad() self.replace_target_network() states, u1s, actions, rewards, states_, u2s, dones = self.sample_memory() indices = np.arange(self.batch_size) q_pred = self.q_eval.forward(states, u1s)[indices, actions] q_next = self.q_next.forward(states_, u2s).max(dim=1) q_next[dones] = 0.0 q_target = rewards + self.gamma*q_next loss = self.q_eval.loss(q_target, q_pred).to(self.device) loss.backward() self.q_eval.optimizer.step() self.learn_step_counter += 1 self.decrement_epsilon()
If you are familiar with DQN (Deep Q- Learning) in Reinforcement Learning you will find these code very similar to it. It is because it is DQN but instead of using one neural network to learn an overall policy I distribute several policies (rm states) between different neural networks (thats why I use a ModuleList of neural networks) and I apply Q learning on them.
I will appreciate your answers,