Why USE_CUDA must be 0 when XLA_CUDA=1

when I use pytorch/XLA on GPUS with docker,I found torch.cuda.is_available()==False。why?IF i want to use XLA and cuda, like pytorch_lightning,pl.Trainer with (accelerator=‘gpu’,strategy=‘xla’,). It will be error of that not found GPUS.

Thanks for raising this issue. I think there are two part of this issue

  1. We disabled the pytorch wheel with USE_CUDA=0 since there was a old bug that we can’t have both pytorch and pytorch/xla both trying to grab the CUDA device. That issue might or might not still be the case.
  2. The lighting’s naming of accelerator is a bit misleading. What you really want to do is to set (accelerator=‘tpu’,strategy=‘xla’,) since accelerator=‘tpu’ really just means accelerator=‘xla’ then you can manually set XLA device to be GPU with GPU_NUM_DEVICES=1. We have never tested this use case, if you run into any issue, please file a feature request to pytoch lighting. We can work with them to enable Pytorch/XLA GPU using lighting.
1 Like

That’s a good observation, thanks for raising this. We have a feature request here: Integrating pytorch XLA when using multiple GPUs · Issue #16130 · Lightning-AI/lightning · GitHub that will support accelerator=gpu + strategy=xla.

@qipengh I suggest you comment on that issue with your use case. To raise the attention and so that the priority can be bumped and eventually be worked on :)) Others have also already asked for this.

EDIT: I just realized you already did! Sorry

2 Likes

OK,Thanks. I will try it, the reply is helpful.