Dictionary Input/Output in C++

jwyman · January 30, 2026, 11:17pm

Could anybody point me in the direction of where the conversion from Python dictionaries to C++ call for AOT Inductor compiled models happens?

For example, this works in Python, but I have no idea how to run the model from C++ when the inputs and/or outputs are Python dictionaries.

Example Code:

#! /usr/bin/python3

import torch


class PyTAotiModelDict(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, INPUT0):
        return {
            "result": INPUT0["input0"] + INPUT0["input1"],
            "input0": INPUT0["input0"],
            "input1": INPUT0["input1"],
        }


SHAPE = (1, 3, 224, 224)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = PyTAotiModelDict()
model.to(device)
model = model.eval()

sample_input = {'input0': torch.randn(*SHAPE, device=device), 'input1': torch.randn(*SHAPE, device=device)}

# Export and package the model
print("Exporting and packaging the model...")

ep = torch.export.export(model, (sample_input,))
torch._inductor.aoti_compile_and_package(ep, package_path="pyt_aoti.pt2")

# Now load and run the packaged model
print("Loading and running the packaged model...")

compiled_model = torch._inductor.aoti_load_package("pyt_aoti.pt2")

with torch.inference_mode():
    input = {'input0': torch.randn(*SHAPE, device=device), 'input1': torch.randn(*SHAPE, device=device)}
    output = compiled_model(input)

    print(f"{type(output)=}")

    if isinstance(output, dict):
        for k, v in output.items():
            print(f"{k}: {v.shape}")

Specifically, the C++ API for AOTIModelContainerRunner is run(const std::vector<at::Tensor>&, void* = nullptr) -> std::vector<at::Tensor>;. With just a list of tensors, I’m at a loss of how to convert to a dictionary of tensors.

Any help would be excellent, thank you.

amjames · February 3, 2026, 6:28pm

This is done for you when loading via aoti_load_package tracing through that you can find that it will instantiate the AOTICompiledModule type. The __call__ implementation here is where that translation is done for you.

The pt2 archive has metadata specifying the pytree spec used to flatten the inputs and unflatten the outputs. For a one of situation you should be able to statically define the inputs in the correct flattened form, for a general solution, I don’t know if we have the pytree utilities to flatten/unflatten from a spec on the C++ side.

jwyman · February 4, 2026, 10:03pm

Thank you for this, I’ll dig into and see if I can map this to the C++ calls in such a way that we can replicate for our use case (which doesn’t have a Python interpreter available).