Setting DTensor OpDispatcher's allow_implicit_replication flag from environment variable for distributed inference of HuggingFace models

When the DTensor dispatcher encounters an op that with a mixture of Tensor and DTensor arguments, the following error is raised:

  File "/home/.../torch/distributed/_tensor/_dispatch.py", line 354, in try_get_replicate_spec
    raise RuntimeError(
RuntimeError: aten.embedding.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!

This is annoying since models from HuggingFace use tensors created at runtime which are not set as module attributes, and hence can’t be casted to DTensor before a forward pass. This behaviour can be avoided by setting the self._allow_implicit_replication to True in the OpDispatcher at torch/distributed/_tensor/_dispatch.py, which currently requires either changing local installation files or forking torch. Can this be changed to something like the following?

self._allow_implicit_replication = os.environ["TORCH_DTENSOR_ALLOW_IMPLICIT_REPLICATION"]
1 Like

Would you like to create an issue on Issues · pytorch/pytorch · GitHub for DTensor maintainers to discuss this? Thanks!

Could you use the implicit_replicatin() context instead? For example: pytorch/test/distributed/_tensor/test_dtensor.py at main · pytorch/pytorch · GitHub