PT built from source, no cuda devices found

Jake_Ehrlich · June 5, 2021, 6:01pm

So I built PyTorch and torch_xla following these discussions and I can import torch and torch_xla but I can’t get any devices to work. When I call xla_device() I get an error saying “Missing XLA configuration”. I want to run on GPU ideally. If I call xla_device(devkind=‘GPU’) the call just hangs (no clue if thats a valid device kind but I thought I’d give it a try).

nvidia-smi returns my desired results, nvcc --version returns 10.1, however torch.cuda.is_available() returns false.

I’m not really sure how to debug further here. It seems like everything built correctly but something in my environment isn’t configured correctly. I’m not sure how to find out what that is however. Any thoughts?

eqy · June 5, 2021, 7:03pm

Can you provide some more details about the setup (e.g., the build options like TORCH_CUDA_ARCH_LIST and the GPU in your setup)?

Jake_Ehrlich · June 6, 2021, 4:08pm

The contributing build instructions never said anything about that. That environment variable is just empty on my side (if it is normally set as a part of the build process somehow, its no longer set in my environment)

I’m using dual 3090s, nvidia-smi returns the following

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80       Driver Version: 460.80       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    Off  | 00000000:21:00.0  On |                  N/A |
|  0%   41C    P8    32W / 350W |    966MiB / 24267MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3090    Off  | 00000000:48:00.0 Off |                  N/A |
|  0%   43C    P8    25W / 350W |      6MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

eqy · June 7, 2021, 6:35am

Can you share the text of the build output with USE_CUDA=1 set explicitly during the build? You might also want to build with --cmake to make sure the results/cmake output are shown during the build.

Jack_Cao · July 20, 2021, 9:08pm

Issues · pytorch/xla · GitHub would be a better place to ask pt/xla related questions. For you question I think you need to export GPU_NUM_DEVICES=2 if you want to use both of your GPU. For the device type if you do

import torch_xla.core.xla_model as xm
device = xm.xla_device()

you should get a xla gpu device. You can take a look at this colab (xla/resnet18-training.ipynb at e660482795cbf394b1ae4b29ee079a33a7a056be · pytorch/xla · GitHub)