Hello,
I am quite new to RL and have been following the DQN tutorial
(https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html)
trying to adapt it to the Bao board game:
- the environment accepts any opponent but for now I am using a random opponent
- board is 4x12, with the bottom two rows being the player’s side
I have been training it but sometimes it gets stuck at a learning loop. It doesn’t get progressively slower, it just randomly stops.
When terminating it, I can see that it terminated at the following function:
def allowed_actions_mask(state):
my_status = state[:, 2:, :] # bottom two lines are my part of the board
action_mask = torch.zeros(my_status.shape, dtype=bool, device=my_status.device)
normal_game = (torch.amax(my_status, dim=(1,2)) > 1)
action_mask[normal_game, :] = (my_status[normal_game, :] > 1)
action_mask[~normal_game, :] = (my_status[~normal_game, :] > 0)
return action_mask
This is used by masking the result of the Q network as follows:
def forward(self, state):
flattened_state = torch.flatten(state, start_dim=1)
encoded_state = torch.nn.functional.one_hot(flattened_state, num_classes=self.max_stones).float()
prediction = self.net(encoded_state).reshape(-1, 2, self.width)
allowed = self.allowed_actions_mask(state)
prediction[~allowed] = float('-inf')
del allowed, flattened_state, encoded_state
torch.cuda.empty_cache()
return prediction
- Any idea on why learning would be stuck randomly and with no error at a learning step?
- How should I approach debugging it?
I checked GPU utilization and ram and it’s well below maximum usage…
Thanks!