Hi,
I am working on an Actor Critic Method where there are infinite numbers of possible actions (let’s say movement of a robot, where the possible movement is not just left-right-up-down but every possible angle).
My obvious idea was to have output mu and sigma^2 in the policy:
class Policy(nn.Module):
"""
implements both actor and critic in one model
"""
def __init__(self):
super(Policy, self).__init__()
self.fc1 = nn.Linear(state_size+1, 128)
self.fc2 = nn.Linear(128, 64)
# actor's layer
self.action_head = nn.Linear(64, 1)
self.mu = nn.Sigmoid()
self.var = nn.Softplus()
# critic's layer
self.value_head = nn.Linear(64, 1)
def forward(self, x):
"""
forward of both actor and critic
"""
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
action_prob = self.action_head(x)
mu = self.mu(action_prob)
var = self.var(action_prob)
state_values = self.value_head(x)
return mu, var, state_values
Now, I also need to calculate the action from this (mu, sigma**2=var) pair. I also need to calculate the probability of happening of that action from that Normal Distribution. I am doing these for them:
sigma = torch.sqrt(var)
action = torch.normal(mu, sigma)
action = torch.clip(action, 0, 1)
pdf_probability = stats.norm.pdf(action.cpu().detach().numpy(), loc=mu.cpu().detach().numpy(), scale=sigma.cpu().detach().numpy())
Now I have some questions. What I am doing, is it ok? It does not feel ok, as I suppose the pdf_probability should be backpropagated as well, but during conversion Tensor->Numpy->Tensor, we lose it.
I have gone through [resolved] Actor Critic with a large amount of possible actions - reinforcement-learning - PyTorch Forums, where they discussed a similar issue. They are also calculating mu, sigma^2 from the policy, but they never talked about how to calculate Action from these (mu,sigma^2).
I also read the A3C paper Asynchronous Methods for Deep Reinforcement Learning (arxiv.org), where they stated that the mu should be calculated by a Linear Layer. Is it mandatory? In my case, I want to only consider positive angles, so, mu should be followed by Sigmoid or ReLU right?