Ways to pass compilation target device to dynamo backend

Is there a way other than checking the torch.compile() input’s device type to pass the compilation target to the backend?

The dynamo backend I am working on can operate on multiple targets and I need to somehow differentiate between them. If all the device types were supported by pytorch, then the code would look like this:

device_type = device_from_inputs(example_inputs).type

if device_type == CPU:
   # compile for CPU
if device_type == CUDA:
   # compile for CUDA
elif device_type == D1:
   # compile for D1
elif device_type == D2:
   # compile for D2

The issue is that while the backend can compile pytorch models for those devices and run the compiled model on them, some of those devices are not supported by pytorch as ATEN devices (e.g. D1 and D2). So, I cannot create a tensor with those device types to be passed to torch.compile() . I want to know if the only solution is to implement the support for those target devices in pytorch to do that differentiation?

I’m not sure if this completely answers your question, but: there’s a way to create custom devices out-of-tree. Check out these examples in core:

python code: pytorch/test_cpp_extensions_open_device_registration.py at e79d9b993890dfa09d1c34c88373a23d0babd121 · pytorch/pytorch · GitHub

cpp code: pytorch/open_registration_extension.cpp at main · pytorch/pytorch · GitHub

open device registration for eager mode is under active development, and we’d like to have feature parity with existing in-tree devices, so feel free to file a github issue if you think there’s any missing functionality!

1 Like

@bdhirsh thank you for your repo GitHub - bdhirsh/pytorch_open_registration_example: Example of using pytorch's open device registration API I am able to register the device with this repo and added TORCH_LIBRARY_IMPL(_, PrivateUse1, m) { m.fallback(torch::CppFunction::makeFallthrough()); } but still am not able to run simple torch.add test or torch.negative test, do i need register all the kernels too, ? i just need annotation to tensors remaining compute should be fine with cpu. based on tensor annotation i can do mem copies from host to device and device to host