im working on a model that learns chess through ddqn reinforcement learning , in short, in this specific snippet of the code:
state_tensor = torch.FloatTensor(state).unsqueeze(0) # Ensure state is a 2D tensor with shape (1, 768)
# Get Q-values for the current state
q_values = env.q_network(torch.FloatTensor(state_tensor))
# Mask out invalid actions
valid_actions = env.get_valid_actions() # This method should return a binary mask of valid actions
# Convert valid_actions to a tensor and reshape to match q_values shape
valid_actions = torch.tensor(valid_actions, dtype=torch.bool).unsqueeze(0) # Now valid_actions has shape [1, 4672]
# Assuming q_values is obtained from the neural network output
#q_values = torch.tensor(q_values, dtype=torch.float32) # Convert q_values to tensor if it isn't already
# Verify the shapes
print(f'q_values shape: {q_values.shape}') # Expected output: [1, 4672]
print(f'valid_actions shape: {valid_actions.shape}') # Expected output: [1, 4672]
# Apply the mask
q_values[~valid_actions] = float('-inf')
i keep getting this error: IndexError: The shape of the mask [1, 4672] at index 1 does not match the shape of the indexed tensor [1, 1, 4672] at index 1
for some reason the tensor dimensions are always mask dimension+1 i realize it is probably because of the “unsqueeze(0)” but i tried like infinite combinations to make the tensor and the mask the same dimentions but it just wouldnt work , at the end it should be in the for of [1, 4672]
tried:
#q_values = torch.tensor(q_values, dtype=torch.float32) # Convert q_values to tensor if it isn't already`
valid_actions = torch.tensor(valid_actions, dtype=torch.bool).unsqueeze(0) # Now valid_actions has shape [1, 4672]
Removing a singleton dimension works fine:
q_values = torch.randn(1, 1, 4672)
valid_actions = torch.randint(0, 2, (1, 4672)).bool()
q_values[~valid_actions] = float('-inf')
# IndexError: The shape of the mask [1, 4672] at index 1 does not match the shape of the indexed tensor [1, 1, 4672] at index 1
# works
q_values = q_values.squeeze(0)
q_values[~valid_actions] = float('-inf')
-
i didnt exactly get what you arre trying to do with the first two lines
-
for more context , the raw original code was without this line:
valid_actions = torch.tensor(valid_actions, dtype=torch.bool).unsqueeze(0) # Now valid_actions has shape [1, 4672]
and it was working just fine in my old code the only thing i did in the new code is use is add a board hash in the get_state method and use zobrist keys in indepent methods that dont affect this part of the code , but in the new code without the line above im getting this error:
IndexError: The shape of the mask [4672] at index 0 does not match the shape of the indexed tensor [1, 4672] at index 0
and it shows that the shape of valid_actions is the one wrong which is giving (4672,)
please help ive been stuck for 3 days on this
I created random tensors reproducing exactly the same error you are seeing.
The second part of the code fixes the issue afterwards by explicitly squeezing the unneeded dimension. You can copy/paste the code to reproduce your reported error and to also run the fixed code.
now i got this error :
q_values[~valid_actions] = float(‘-inf’)
IndexError: too many indices for tensor of dimension 1
and here is the output:
Forward pass input shape: torch.Size([1, 768])
q_values shape: torch.Size([1, 4672])
valid_actions shape: torch.Size([1, 4672])
it just keeps going around in circles to the same error
My code snippet works, so could you post a minimal and executable code snippet reproducing the new error?
here is the the whole part of it from the training code( if you want me to post the whole code i could):
state_tensor = torch.FloatTensor(state).unsqueeze(0) # Ensure state is a 2D tensor with shape (1, 768)
# Get Q-values for the current state
q_values = env.q_network(torch.FloatTensor(state_tensor))
# Mask out invalid actions
valid_actions = env.get_valid_actions() # This method should return a binary mask of valid actions
# Convert valid_actions to a tensor and reshape to match q_values shape
valid_actions = torch.tensor(valid_actions, dtype=torch.bool).unsqueeze(0) # Now valid_actions has shape [1, 4672]
# Assuming q_values is obtained from the neural network output
#q_values = torch.tensor(q_values, dtype=torch.float32) # Convert q_values to tensor if it isn't already
# Verify the shapes
print(f'q_values shape: {q_values.shape}') # Expected output: [1, 4672]
print(f'valid_actions shape: {valid_actions.shape}') # Expected output: [1, 4672]
# Apply the mask
q_values = q_values.squeeze(0) # Remove the batch dimension
q_values[~valid_actions] = float('-inf')
and here is all the methods used from the chess_env :
def generate_all_moves(self):
return [move.uci() for move in self.board.legal_moves]
def get_action_size(self):
return 4672 # Maximum number of legal moves in any position
def get_valid_actions(self):
valid_actions = np.zeros(self.get_action_size(), dtype=bool)
for move in self.board.legal_moves:
action = self.move_to_action(move) # This method should convert a move to an action index
valid_actions[action] = True
return valid_actions
def make_move(self, move):
if move not in self.actions:
raise ValueError(f"Illegal move: {move}")
print(f"Making move: {move}")
self.board.push_uci(move)
self.actions = self.generate_all_moves() # Refresh actions after making a move
def move_to_action(self, move):
# Convert the move to a string using UCI notation
move_str = move.uci()
# Use a hash function to convert the string to an integer
action = hash(move_str)
# Modulo by the action size to ensure the action is within the valid range
action = action % self.get_action_size()
return action