RuntimeError: mat1 and mat2 shapes cannot be multiplied in actor-critic

following error RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x1 and 3x256)

happens at this line: x = F.relu(self.linear1(state))

class Critic(nn.Module):
    def __init__(self, state_size, action_size, hidden_size, output_size):
        super(Critic, self).__init__()
        self.linear1 = nn.Linear(state_size, hidden_size)
        self.linear2 = nn.Linear(hidden_size+action_size, hidden_size)
        self.linear3 = nn.Linear(hidden_size, output_size)

    def forward(self, state, action):
        x = F.relu(self.linear1(state))
        x = torch.cat([x, action], dim = 1)
        x = F.relu(self.linear2(x))
        #x = F.relu(self.linear2(x))
        x = self.linear3(x)

        return x

and that 64 depends on the batch_size below

  def sample(self, batch_size):
        state_batch = []
        action_batch = []
        reward_batch = []
        next_state_batch = []
        done_batch = []

        batch = random.sample(self.buffer, batch_size)

        for experience in batch:
            state, action, reward, next_state, done = experience
            state_batch.append(state)
            action_batch.append(action)
            reward_batch.append(reward)
            next_state_batch.append(next_state)
            done_batch.append(done)

        return state_batch, action_batch, reward_batch, next_state_batch, done_batch

does anyone know how this error can be fixed?

So, this mis-match would indicate that state_size is different from what you expected. I assume that state_batch is a Tensor of shape 64x1? But expect it to be 64x3?

Also, if you’re expecting a Tensor to be passed in, shouldn’t you use torch.stack to covert the list of Tensors to a single Tensor which can be passed in batch? Could you check the type and shape of state_batch?

type of `state_batch` is `list` with a length of `64`.

And the complete code for the state_batch is here:

class Memory:
    def __init__(self, max_size):
        self.max_size = max_size
        self.buffer = deque(maxlen=max_size)

    def push(self, state, action, reward, next_state, done):

        experience = (state, action, np.array([reward]), next_state, done)
        self.buffer.append(experience)

    def sample(self, batch_size):
        state_batch = []
        action_batch = []
        reward_batch = []
        next_state_batch = []
        done_batch = []

        batch = random.sample(self.buffer, batch_size)

        for experience in batch:
            state, action, reward, next_state, done = experience
            state_batch.append(state)
            print(f"state_batch in sample method is: {type(state_batch)}")
            action_batch.append(action)
            reward_batch.append(reward)
            next_state_batch.append(next_state)
            done_batch.append(done)

        return state_batch, action_batch, reward_batch, next_state_batch, done_batch

What are the contents of the list? Are they Tensors? If so, what shape? If the elements are of shape [1,3] and you have a list of 64 items, you can stack the list of Tensors into a single Tensor which might solve the mis-match error!

Could you print what state_batch[0].shape returns?

elements of state_batch list are state. They are numbers (scalars). Below is the full stack error. And part of update function that is referring to:

Traceback (most recent call last):
  File "main.py", line 34, in <module>
    agent.update(batch_size)
  File "ddpg.py", line 62, in update
    Qvals = self.critic.forward(states, actions)
  File "model.py", line 56, in forward
    x = F.relu(self.linear1(state))

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x1 and 3x128)

    def update(self, batch_size):
        states, actions, rewards, next_states, _ = self.memory.sample(batch_size)
        states = torch.FloatTensor(states)
        states = states.unsqueeze(1).expand(-1, 1)
        actions = torch.FloatTensor(actions)
        rewards = torch.FloatTensor(rewards)
        next_states = torch.FloatTensor(next_states)

        # Critic loss
        Qvals = self.critic.forward(states, actions)
        next_actions = self.actor_target.forward(next_states)
        next_Q = self.critic_target.forward(next_states, next_actions.detach())
        Qprime = rewards + self.gamma * next_Q
        critic_loss = self.critic_criterion(Qvals, Qprime)

So, if state is a batch of scalars, it makes sense why you’re getting a mis-match within x = F.relu(self.linear1(state)), perhaps change self.linear1 to be of size [1, 128]? rather than [3,128]?

how can this change be made?

change the state_size to 1 rather than 3 will make it pass through that layer. If you’re sure the state_size is 3, then the issue lies with state instead

I am sure about state_size being 3. but state issue seems unsolvable!

So, state_batch is a list of 64 elements which are purely scalars? So, state_batch[0].shape would return 1? If that’s true, and you’re sure state_size is 3 then something’s wrong with the state_batch variable!

state_bacth[0].shape returns ().
the state_bacth[0] returns a scalar.

So, that explains the mis-match issue. Either state_batch should be of size [64,3] or state_size should be of size 1.