Batching a multicategorical spec

rsarpongstreetor · July 24, 2025, 11:34am

Can I batch a multicategoral spec as shown below. How do i it . I have tried inheriting the composite batch size but its not working. I need it to work in my MARL environment .

if self.categorical_actions:
self.action_spec = Composite(
{
(“agents”, “action”): MultiCategorical(
nvec=torch.tensor([self.num_individual_actions] * self.obs_dim, device=self.device).repeat(self.num_agents, 1), # Repeat nvec for each agent
shape=torch.Size([self.num_agents, self.obs_dim]), # Shape is (num_agents, obs_dim) per env
dtype=torch.int64,
device=self.device,
)
},

         )
    else:
         self.action_spec = Composite(
             {
                 ("agents", "action"): UnboundedContinuousTensorSpec(
                     shape=torch.Size([self.num_agents, self.obs_dim]), # Shape is (num_agents, obs_dim) per env
                     dtype=torch.float32,
                     device=self.device,
                 )
             },

         )

vmoens · August 4, 2025, 9:32pm

Thanks for the question

can you report:

a complete example that I can reproduce easily
an error stack and an expected behaviour?

rsarpongstreetor · August 6, 2025, 5:29am

I aimed to create and run a full BenchMARL experiment utilizing a custom VMAS scenario, a custom BenchMARL task, and custom policy and critic models. This involved several key stages: defining the custom scenario, defining the custom task, defining the custom models, configuring the experiment, and finally instantiating and running the experiment.

The process successfully defined the custom MyScenario, Task, CustomPolicyModel, and CustomCriticModel classes. However, significant challenges were encountered during the configuration and instantiation phases of the BenchMARL Experiment.

Multiple attempts were made to correctly structure the configuration dictionaries (task_config, model_config, mappo_config_params, exp_config_for_experimentconfig, policy_model_config_structured, critic_model_config_structured, run_config) and instantiate the necessary BenchMARL objects (MappoConfig, ExperimentConfig, Experiment). Debugging involved inspecting the signatures of BenchMARL classes (MappoConfig.__init__, ExperimentConfig.__init__, Experiment.run) to ensure parameters were correctly passed.

The most critical and persistent issue arose when defining the action and value specifications using TorchRL’s Composite and MultiCategorical for the multi-agent discrete action space. Despite numerous attempts to adjust the shape and batch size arguments for these specs, a ValueError or IndexError related to shape mismatches consistently occurred during the instantiation of the Experiment object. This indicated a fundamental problem with how the scalar discrete action for each agent was being represented within the batched multi-agent specification structure expected by TorchRL/BenchMARL.

Data Analysis Key Findings

The custom MyScenario class was successfully defined, including methods for data loading, environment setup, observation, reward, and termination based on a provided CSV data file.
The Task class inheriting from BenchMARL.environments.common.TaskClass was correctly implemented, providing necessary methods like get_env_fun, get_specs, max_steps, and group_map.
Custom policy (CustomPolicyModel) and critic (CustomCriticModel) network modules, including GNN layers and handling of batched multi-agent data, were successfully defined.
The primary blocker was the inability to correctly define the torchrl.data.Composite specification for the multi-agent discrete action space using torchrl.data.tensor_specs.MultiCategorical. Attempts to define a single discrete action (3 options) per agent within the batched structure consistently resulted in ValueError or IndexError due to shape mismatches between the scalar action spec and the agent batch dimension.
Similar issues were encountered when defining the value specification for the critic’s output using Composite and Unbounded specs.
Correctly partitioning configuration parameters between the ExperimentConfig object and the Experiment.run method required multiple iterations of inspecting class signatures and adjusting the configuration dictionaries.

Insights or Next Steps

The specific combination of TorchRL specs (Composite and MultiCategorical) for defining a multi-agent discrete action space with scalar actions per agent appears to have a complex or undocumented requirement regarding shape and batch size interaction that could not be resolved through standard interpretation of error messages. Further research or consultation of specific TorchRL/BenchMARL multi-agent discrete example implementations is needed.
A simpler approach to defining the action space specification within the Task.get_specs method, potentially by relying more heavily on the environment’s own spec generation if possible, might circumvent the manual Composite/MultiCategorical construction issues.
I have also used the pytorchrl difinitions for multi agents make_spec method using the combinations of specs and Composite and have faced similar issues above with the ‘‘check env specs’’,
Can you look into it the Composite/MultiCategorical and if possible the whole specs construction issues.

vmoens · August 25, 2025, 6:41am

Hello, as mentioned earlier an error stack or reprod example would be amazing!

rsarpongstreetor · August 27, 2025, 7:41am

I succeeded in implementing the MultiCategorical by not batching the action spec as below, the batched version was provided by Envbase or Gemini

nvec=torch.full(( self.num_individual_actions_features,), self.num_individual_actions, dtype=torch.int64, device=self.device)

    agent_action_spec =MultiCategorical(nvec,
            shape=torch.Size([self.num_individual_actions_features,]),
            dtype=torch.int64,
            device=self.device
        )
    self.action_spec_unbatched = Composite(
          {("agents","action"): agent_action_spec}, batch_size=[self.num_agents,], device=self.device)

    self.action_spec = Composite(
         {("agents","action"): MultiCategorical(nvec, shape=torch.Size([*self.batch_size, self.num_individual_actions_features]), dtype=torch.int64, device=self.device)},
         batch_size=self.batch_size, device=self.device
    )

rsarpongstreetor · August 27, 2025, 7:46am

For the ProbalisticActor, I have The Policy wrapped in a TensorDictModule takes (“agents”, “observation”) as input and outputs logits under (‘agents’, ‘action’, ‘logits’).

The ProbabilisticActor is configured with spec=base_env.action_spec, in_keys=[(‘agents’, ‘action’)], out_keys=[(‘agents’, ‘action’),‘logits’], distribution_class=CompositeDistribution, and distribution_kwargs={“distribution_map”:{(‘agents’, ‘action’): d.MultiCategorical},}.

but still not working, its producing scale and loc instead of logits

We are unfortunately still hitting the exact same persistent RuntimeError. The error message continues to indicate that the ProbabilisticActor is receiving a TensorDict with unexpected keys ('loc', 'scale', and 'action') at the ('agents',) level, when it is configured to expect ('agents', 'action', 'logits').

Despite our efforts to align the policy output structure with the expected input for the ProbabilisticActor handling a MultiCategorical distribution within a composite action space, this specific key mismatch error remains unresolved.

This strongly suggests an underlying issue with how TensorDict keys are managed and propagated within the torchrl data collection pipeline when dealing with your specific nested MultiCategorical action space and the ProbabilisticActor.

Any help?