TorchRL Probabilistic Actor Returning Continuous Actions Despite Given Bounded Discrete Action Spec

Hi! I am currently using TorchRL to build an instance of a reference game, and as such I need my action spaces to be discrete. As such, I have defined my speaker action space as

self.action_spec = Composite(
                    {"action": Bounded(
                        shape = self.action_space,
                        low = 0,
                        high = self.vocab_size,
                        domain = "discrete",
                        dtype = torch.int64
                    )}, shape=torch.Size([self.n_envs, 1]),
                )

and my listener action pace as

self.action_spec = Composite(
                    {"action": Bounded(
                        shape = self.action_space,
                        low = 0,
                        high = self.n_images-1,
                        domain = "discrete",
                        dtype = torch.int64
                    )}, shape=torch.Size([self.n_envs, 1])
                )

I am then using the MADDPG tutorial to build a training loop, but noticing that the actions for my listener and speaker end up being continuous in nature (I.E. on the continuum between 0-vocab_size and 0-self.n_images-1 respectively). This seems to violate the specs I have designed so I am confused as to what is going on and would appreciate any pointers.

Hello!
Do you know / can you show where your actions are cast to a tensor?
The specs are not enforced because that would be costly - it would mean that at every iteration the dtype, device, shape etc of every tensor would be checked against the spec which is time consuming.
We use specs to

  • preallocate buffers in distributed / multiproc settings
  • check (not project) if tensors are in the space defined by the spec in the env
  • project the value of the action to the required space in the Actor (eg, clamp the action) given some heuristic in TensorSpec.project.

What I think is happening here is that your action is created as a float tensor and unless you run check_env_specs(env) or use your env in multiproc settings, you will not get an error