How to find Action in an Actor Critic Method with infinite possible actions

Khabbab_Zakaria · January 18, 2022, 4:42pm

Hi,
I am working on an Actor Critic Method where there are infinite numbers of possible actions (let’s say movement of a robot, where the possible movement is not just left-right-up-down but every possible angle).
My obvious idea was to have output mu and sigma^2 in the policy:

class Policy(nn.Module):
    """
    implements both actor and critic in one model
    """
    def __init__(self):
        super(Policy, self).__init__()
        self.fc1 = nn.Linear(state_size+1, 128)

        self.fc2 = nn.Linear(128, 64)

        # actor's layer
        self.action_head = nn.Linear(64, 1)
        self.mu = nn.Sigmoid()
        self.var = nn.Softplus()

        # critic's layer
        self.value_head = nn.Linear(64, 1)


    def forward(self, x):
        """
        forward of both actor and critic
        """
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))

        action_prob = self.action_head(x)
        mu = self.mu(action_prob)
        var = self.var(action_prob)

        state_values = self.value_head(x)

        return mu, var, state_values

Now, I also need to calculate the action from this (mu, sigma**2=var) pair. I also need to calculate the probability of happening of that action from that Normal Distribution. I am doing these for them:

sigma = torch.sqrt(var)
action = torch.normal(mu, sigma)
action = torch.clip(action, 0, 1)
pdf_probability = stats.norm.pdf(action.cpu().detach().numpy(), loc=mu.cpu().detach().numpy(), scale=sigma.cpu().detach().numpy())

Now I have some questions. What I am doing, is it ok? It does not feel ok, as I suppose the pdf_probability should be backpropagated as well, but during conversion Tensor->Numpy->Tensor, we lose it.
I have gone through [resolved] Actor Critic with a large amount of possible actions - reinforcement-learning - PyTorch Forums, where they discussed a similar issue. They are also calculating mu, sigma^2 from the policy, but they never talked about how to calculate Action from these (mu,sigma^2).
I also read the A3C paper Asynchronous Methods for Deep Reinforcement Learning (arxiv.org), where they stated that the mu should be calculated by a Linear Layer. Is it mandatory? In my case, I want to only consider positive angles, so, mu should be followed by Sigmoid or ReLU right?