Hi! I am currently using TorchRL to build an instance of a reference game, and as such I need my action spaces to be discrete. As such, I have defined my speaker action space as
self.action_spec = Composite(
{"action": Bounded(
shape = self.action_space,
low = 0,
high = self.vocab_size,
domain = "discrete",
dtype = torch.int64
)}, shape=torch.Size([self.n_envs, 1]),
)
and my listener action pace as
self.action_spec = Composite(
{"action": Bounded(
shape = self.action_space,
low = 0,
high = self.n_images-1,
domain = "discrete",
dtype = torch.int64
)}, shape=torch.Size([self.n_envs, 1])
)
I am then using the MADDPG tutorial to build a training loop, but noticing that the actions for my listener and speaker end up being continuous in nature (I.E. on the continuum between 0-vocab_size and 0-self.n_images-1 respectively). This seems to violate the specs I have designed so I am confused as to what is going on and would appreciate any pointers.