Hi All!
torch.xpu.synchronize()
hangs on pytorch 2.6.0+xpu.
Steps to reproduce:
(2_6_0_xpu) $ python
Python 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:56:27) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.6.0+xpu'
>>> t = torch.randn (10000, 10000, device = 'xpu')
>>> for i in range (100):
... u = t @ t
...
>>> torch.xpu.synchronize()
Terminated
(2_6_0_xpu) $
Note, the loop over u = t @ t
takes about 30 seconds to run (but the for
command
returns promptly), while torch.xpu.synchronize()
is launched within a second or two
of starting the for
loop.
In this circumstance, torch.xpu.synchronize()
hangs apparently indefinitely. (I let it run
for five minutes before killing the python process from a separate terminal window.)
During this time intel_gpu_top
shows, after about thirty seconds, that the xpu is no longer
in use, while top (and the ubuntu system monitor) show the python process consuming
“100%” of the cpu.
If torch.xpu.synchronize()
is launched after the u = t @ t
loop has actually completed,
it doesn’t hang, and it doesn’t seem to hang if it is launched while a shorter, three-second
for i in range (10):
loop is still running.
Pytorch with xpu support was installed in a newly-created conda environment following
this pytorch documentation using:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu
(I had to copy /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.33 into the conda environment,
replacing the conda environment’s libstdc++.so.6.0.29, in order to get the xpu to work,
although the details of this are confused.)
This is on ubuntu 24.04.2 LTS with an intel core ultra 185h processor (and also an
nvidia graphics chip).
Thanks for any information or updates on the status of 2.6.0+xpu.
K. Frank