Dimensions of mask and tensor stays different

im working on a model that learns chess through ddqn reinforcement learning , in short, in this specific snippet of the code:

state_tensor = torch.FloatTensor(state).unsqueeze(0)  # Ensure state is a 2D tensor with shape (1, 768)
# Get Q-values for the current state
q_values = env.q_network(torch.FloatTensor(state_tensor))  


# Mask out invalid actions
valid_actions = env.get_valid_actions()  # This method should return a binary mask of valid actions
# Convert valid_actions to a tensor and reshape to match q_values shape
valid_actions = torch.tensor(valid_actions, dtype=torch.bool).unsqueeze(0)  # Now valid_actions has shape [1, 4672]

# Assuming q_values is obtained from the neural network output
#q_values = torch.tensor(q_values, dtype=torch.float32)  # Convert q_values to tensor if it isn't already

# Verify the shapes
print(f'q_values shape: {q_values.shape}')  # Expected output: [1, 4672]
print(f'valid_actions shape: {valid_actions.shape}')  # Expected output: [1, 4672]
# Apply the mask
q_values[~valid_actions] = float('-inf')

i keep getting this error: IndexError: The shape of the mask [1, 4672] at index 1 does not match the shape of the indexed tensor [1, 1, 4672] at index 1

for some reason the tensor dimensions are always mask dimension+1 i realize it is probably because of the “unsqueeze(0)” but i tried like infinite combinations to make the tensor and the mask the same dimentions but it just wouldnt work , at the end it should be in the for of [1, 4672]
tried:

#q_values = torch.tensor(q_values, dtype=torch.float32)  # Convert q_values to tensor if it isn't already`
valid_actions = torch.tensor(valid_actions, dtype=torch.bool).unsqueeze(0)  # Now valid_actions has shape [1, 4672]

Removing a singleton dimension works fine:

q_values = torch.randn(1, 1, 4672)
valid_actions = torch.randint(0, 2, (1, 4672)).bool()

q_values[~valid_actions] = float('-inf')
# IndexError: The shape of the mask [1, 4672] at index 1 does not match the shape of the indexed tensor [1, 1, 4672] at index 1

# works
q_values = q_values.squeeze(0)
q_values[~valid_actions] = float('-inf')
  1. i didnt exactly get what you arre trying to do with the first two lines

  2. for more context , the raw original code was without this line:

valid_actions = torch.tensor(valid_actions, dtype=torch.bool).unsqueeze(0)  # Now valid_actions has shape [1, 4672]

and it was working just fine in my old code the only thing i did in the new code is use is add a board hash in the get_state method and use zobrist keys in indepent methods that dont affect this part of the code , but in the new code without the line above im getting this error:
IndexError: The shape of the mask [4672] at index 0 does not match the shape of the indexed tensor [1, 4672] at index 0
and it shows that the shape of valid_actions is the one wrong which is giving (4672,)
please help ive been stuck for 3 days on this

I created random tensors reproducing exactly the same error you are seeing.
The second part of the code fixes the issue afterwards by explicitly squeezing the unneeded dimension. You can copy/paste the code to reproduce your reported error and to also run the fixed code.

now i got this error :
q_values[~valid_actions] = float(‘-inf’)
IndexError: too many indices for tensor of dimension 1

and here is the output:
Forward pass input shape: torch.Size([1, 768])
q_values shape: torch.Size([1, 4672])
valid_actions shape: torch.Size([1, 4672])

it just keeps going around in circles to the same error

My code snippet works, so could you post a minimal and executable code snippet reproducing the new error?

here is the the whole part of it from the training code( if you want me to post the whole code i could):

 state_tensor = torch.FloatTensor(state).unsqueeze(0)  # Ensure state is a 2D tensor with shape (1, 768)
        # Get Q-values for the current state
        q_values = env.q_network(torch.FloatTensor(state_tensor))  


        # Mask out invalid actions
        valid_actions = env.get_valid_actions()  # This method should return a binary mask of valid actions
        # Convert valid_actions to a tensor and reshape to match q_values shape
        valid_actions = torch.tensor(valid_actions, dtype=torch.bool).unsqueeze(0)  # Now valid_actions has shape [1, 4672]

        # Assuming q_values is obtained from the neural network output
        #q_values = torch.tensor(q_values, dtype=torch.float32)  # Convert q_values to tensor if it isn't already

        # Verify the shapes
        print(f'q_values shape: {q_values.shape}')  # Expected output: [1, 4672]
        print(f'valid_actions shape: {valid_actions.shape}')  # Expected output: [1, 4672]
        # Apply the mask
        q_values = q_values.squeeze(0)  # Remove the batch dimension
        q_values[~valid_actions] = float('-inf')

and here is all the methods used from the chess_env :


    def generate_all_moves(self):
        return [move.uci() for move in self.board.legal_moves]

    def get_action_size(self):
        return 4672  # Maximum number of legal moves in any position
        
    def get_valid_actions(self):
     valid_actions = np.zeros(self.get_action_size(), dtype=bool)
     for move in self.board.legal_moves:
        action = self.move_to_action(move)  # This method should convert a move to an action index
        valid_actions[action] = True
     return valid_actions

    def make_move(self, move):
        if move not in self.actions:
            raise ValueError(f"Illegal move: {move}")
        print(f"Making move: {move}")
        self.board.push_uci(move)
        self.actions = self.generate_all_moves()  # Refresh actions after making a move

    def move_to_action(self, move):
     # Convert the move to a string using UCI notation
     move_str = move.uci()

     # Use a hash function to convert the string to an integer
     action = hash(move_str)

     # Modulo by the action size to ensure the action is within the valid range
     action = action % self.get_action_size()

     return action