Registering a new device to pass tensors to torch.compile()

FatemeH · May 22, 2023, 5:32pm

Hi all,

I am in the process of adding a new device to pytorch (it will only be used for inference). At this point, I only want to register this device and use it for defining a tensor and passing in across some torch APIs.

The use case would be something like this:

device = torch.device("XYZ")
data = torch.rand(in_shape, device =device)
result = torch.compile(backend="my_backend")(data)

As you can see, the tensor defined for this device (data in the example) is passed to torch.compile to be used by the compiler backend I’ve registered for _dynamo. All I want to do in my_backend with that device attribute is to check it’s value make some decisions if it is “XYZ” and not CPU. So, I don’t need to actually store the data tensor on the device or implement any specific tensor operations but just to pass the tensor and check that attribute.

I’ve registered the device and it’s corresponding XYZ dispatch key in c10/core following this repo:
Add support for the ONNX Runtime Eager Mode backend by abock · Pull Request #58248 · pytorch/pytorch (github.com)

So now device = torch.device("XYZ") returns XYZ to me, but the tensor initialization fails with this error:

Could not run ‘aten::rand’ with arguments from the ‘XYZ’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build)

Am I taking the correct path here? How to resolve this issue when I don’t need have any specific tensor implementation to create/store the tensor on my device? I basically want to use whatever is already on the cpu and just change the device attribute.

I’d appreciate any help with this.

eqy · May 23, 2023, 5:53am

What do you mean by “check its value?” By definition a Tensor’s device attribute indicates where the memory occupied by the Tensor is (with the exceptions being e.g., Fake Tensor — torchdistX 0.2.0 documentation), so it’s not really possible to have a Tensor on another device if basic operations like memory allocation and data transfer do not exist. If the issue is just that there is no implementation for rand, you could try torch.empty(in_shape, device=device) if e.g., memory allocation is implemented.

If memory allocation is not implemented, and you do not actually need to check data values (and just the device type), I would also check if a Tensor of shape “0” would work e.g.,:

>>> import torch
>>> a = torch.empty(0)
>>> a
tensor([])
>>> a.data_ptr()
0
>>> a.device
device(type='cpu')

As a side note, if by “check its value” you are referring to the ability to print e.g., GPU tensors, note that the printing of GPU tensors is implemented by copying the data to CPU first and then printing—before that point the data must occupy GPU memory.

FatemeH · May 23, 2023, 4:25pm

Thanks for your response @eqy. By checking its value, I mean that in my backend I need to differentiate between different devices. so I should have an if statement like

if input.device.type == "XYZ":

where input is the input tensor to the torch.compile API. The rest of operations in the backend convert pytorch model to another representation and run that one on the device, as a result, I don’t need actual implementation of tensor operations of XYZ to be implemented in pytorch.