Hello there,
I am trying to distribute operations through different neural networks in the same agent. For that purpose I have created a ModuleList of “n_rm_states” objects. My goal is to input the state tensor to feed forward one of the neural networks in the ModuleList depending on what rm_state takes as input too, thats why the forward function has two inputs, the state and the rm_state. So basically, the so called rm_state is just used as an index to select what neural network in the ModuleList to use. This my network code:
class DeepQNetwork(nn.Module):
def __init__(self, lr, n_actions, name, input_dims, n_rm_states,chkpt_dir):
super(DeepQNetwork, self).__init__()
self.checkpoint_dir = chkpt_dir
self.checkpoint_file = os.path.join(self.checkpoint_dir, name)
self.n_actions = n_actions
self.n_rm_states = n_rm_states
# each self.rm_network index will correspond to a separate neural network
# for learing an specific "n" rm state policy
self.rm_network = []
for i in range(n_rm_states):
self.rm_network.append(
nn.Sequential(
nn.Linear(np.prod(input_dims), 256),nn.ReLU(),
nn.Linear(256, n_actions)
)
)
# nn.ModuleList makes Pytorch read the created Python list as a
# nn.Module list of objects
self.rm_network = nn.ModuleList(self.rm_network)
self.optimizer = optim.RMSprop(self.rm_network.parameters(), lr=lr)
self.loss = nn.MSELoss()
# device nn.Module function already defined in Agent
self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')
self.to(self.device)
def forward(self, state, rm_state):
# Flatten the observation [BS, 7, 7, 3] --> [BS, 147]
flat_state = state.view(state.size()[0], -1).to(self.device)
# Initialize the actions tensor [BS, n_actions=7]
actions = T.zeros(state.size()[0], self.n_actions, requires_grad=False).to(self.device)
for i in range(self.n_rm_states):
# Loop over the rm_states and feed forward each observation to its
# correspondant rmstate neural network
try:
rm_index = T.where(rm_state == i)[0]
actions[rm_index] = self.rm_network[i](flat_state[rm_index])
except:
pass
return actions
So far, it works as intended, no errors pop up but something weird happens. When I use only one network to train my agent it learns correctly, the problem appears when I try to add these few several neural networks. It looks like it does not backpropagate correctly. I have no idea why, because no errors appear to possibly debug it, the only consequence I see is the lack of progress in the agent learning process.
Here is the code I use for backpropagate:
def learn(self):
if self.memory.mem_cntr < self.batch_size:
return
self.q_eval.optimizer.zero_grad()
self.replace_target_network()
states, u1s, actions, rewards, states_, u2s, dones = self.sample_memory()
indices = np.arange(self.batch_size)
q_pred = self.q_eval.forward(states, u1s)[indices, actions]
q_next = self.q_next.forward(states_, u2s).max(dim=1)[0]
q_next[dones] = 0.0
q_target = rewards + self.gamma*q_next
loss = self.q_eval.loss(q_target, q_pred).to(self.device)
loss.backward()
self.q_eval.optimizer.step()
self.learn_step_counter += 1
self.decrement_epsilon()
If you are familiar with DQN (Deep Q- Learning) in Reinforcement Learning you will find these code very similar to it. It is because it is DQN but instead of using one neural network to learn an overall policy I distribute several policies (rm states) between different neural networks (thats why I use a ModuleList of neural networks) and I apply Q learning on them.
I will appreciate your answers,
Thanks beforehand.