Hi All,
Just starting using pyTorch and TorchRL, I am trying to adapt the PPO tutorial to my own settings.
I want to train robots which have a mixture of wheels and legs. So, I have a composite actor spec with joints and wheels output.
But I fail to define a working probabilistic actor.
My actor_spec:
self.action_spec = CompositeSpec(
joints=BoundedTensorSpec(
low=-torch.pi,
high=torch.pi,
shape=(len(self.robot.joint_ids),),
dtype=torch.float32,
device=self.device,
),
wheels=BoundedTensorSpec(
low=-100,
high=100,
shape=(len(self.robot.wheel_ids),),
dtype=torch.float32,
device=self.device,
)
)
And the definition of my actor:
actor_net = nn.Sequential(
nn.LazyLinear(num_cells, device=device),
nn.Tanh(),
nn.LazyLinear(num_cells, device=device),
nn.Tanh(),
nn.LazyLinear(num_cells, device=device),
nn.Tanh(),
nn.LazyLinear(2 * env.action_spec["joints"].shape[-1] + 2 * env.action_spec["wheels"].shape[-1], device=device),
NormalParamExtractor(),
)
# Define the policy module
policy_module = TensorDictModule(
actor_net, in_keys=["observation"],
out_keys=["joints_loc", "joints_scale", "wheels_loc", "wheels_scale"]
)
actor = ProbabilisticActor(
module=policy_module,
spec=env.action_spec,
in_keys=["joints_loc", "joints_scale", "wheels_loc", "wheels_scale"],
out_keys=["joints", "wheels"],
distribution_class=TanhNormal,
distribution_kwargs={
"min": torch.cat([env.action_spec["joints"].space.low, env.action_spec["wheels"].space.low]),
"max": torch.cat([env.action_spec["joints"].space.high, env.action_spec["wheels"].space.high]),
},
)
When I run actor(env.reset())
I get this error:
KeyError: 'key "wheels_loc" not found in TensorDict with keys [\'done\', \'joints_loc\', \'joints_scale\', \'observation\', \'step_count\', \'terminated\']'
It seems that it does only take into account the two first out_keys of my module.
How would you implement a probabilistic actor with a composite action output? Or how to define a probabilistic actor with two normal distribution?
Many thanks!