Why USE_CUDA must be 0 when XLA_CUDA=1

qipengh · February 16, 2023, 3:57am

when I use pytorch/XLA on GPUS with docker，I found torch.cuda.is_available()==False。why？IF i want to use XLA and cuda， like pytorch_lightning，pl.Trainer with (accelerator=‘gpu’,strategy=‘xla’,）. It will be error of that not found GPUS.

JackCaoG · February 16, 2023, 7:02pm

Thanks for raising this issue. I think there are two part of this issue

We disabled the pytorch wheel with USE_CUDA=0 since there was a old bug that we can’t have both pytorch and pytorch/xla both trying to grab the CUDA device. That issue might or might not still be the case.
The lighting’s naming of accelerator is a bit misleading. What you really want to do is to set (accelerator=‘tpu’,strategy=‘xla’,） since accelerator=‘tpu’ really just means accelerator=‘xla’ then you can manually set XLA device to be GPU with GPU_NUM_DEVICES=1. We have never tested this use case, if you run into any issue, please file a feature request to pytoch lighting. We can work with them to enable Pytorch/XLA GPU using lighting.

adrianwaelchli · February 17, 2023, 10:31am

That’s a good observation, thanks for raising this. We have a feature request here: Integrating pytorch XLA when using multiple GPUs · Issue #16130 · Lightning-AI/lightning · GitHub that will support accelerator=gpu + strategy=xla.

@qipengh I suggest you comment on that issue with your use case. To raise the attention and so that the priority can be bumped and eventually be worked on :)) Others have also already asked for this.

EDIT: I just realized you already did! Sorry

qipengh · February 24, 2023, 2:41am

OK，Thanks. I will try it, the reply is helpful.