Torch errors on Intel GPU

I am trying to run some things on my laptop with Intel GPU using Ubuntu 24.04 under WSL2 in Windows 11 24H2. As I found at the Pytorch website, it should be painless enough by just changing device to ‘xpu’. Of course I’ve done driver installations and other like it said here
But even with simple CNN I catch this strange error that can’t be found anywhere (I thought everything can be found on the Internet, LOL). Using traditional CUDA doesn’t make any troubles with my code. Changing batch size also doesn’t help, so I don’t think that the lack of RAM is the key. Here it is:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[12], line 4
     1 criterion = nn.CrossEntropyLoss()
     2 optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
----> 4 model, history_cnn = train(
     5     model, criterion, optimizer,
     6     train_batch_gen, val_batch_gen, num_epochs=40
     7 )

Cell In[8], line 44, in train(model, criterion, optimizer, train_batch_gen, val_batch_gen, num_epochs)
    42 forward_start_time = time.time()
    43 # Логиты на выходе модели
---> 44 logits = model(X_batch)
    45 history['time']['forward'].append(time.time() - forward_start_time)
    47 # Подсчитываем лосс

File ~/HW/DS-поток/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
  1749     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
  1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File ~/HW/DS-поток/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
  1757 # If we don't have any hooks, we want to skip the rest of the logic in
  1758 # this function, and just call forward.
  1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
  1760         or _global_backward_pre_hooks or _global_backward_hooks
  1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
  1764 result = None
  1765 called_always_called_hooks = set()

Cell In[9], line 29, in SimpleConvNet.forward(self, x)
    28 def forward(self, x):
---> 29     layer1 = self.relu1(self.droupout1(self.bn1(self.conv1(x))))
    30     layer2 = self.relu2(self.droupout2(self.bn2(self.conv2(layer1))))
    31     layer3 = self.relu3(self.droupout3(self.bn3(self.conv3(layer2))))

File ~/HW/DS-поток/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
  1749     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
  1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File ~/HW/DS-поток/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
  1757 # If we don't have any hooks, we want to skip the rest of the logic in
  1758 # this function, and just call forward.
  1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
  1760         or _global_backward_pre_hooks or _global_backward_hooks
  1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
  1764 result = None
  1765 called_always_called_hooks = set()

File ~/HW/DS-поток/.venv/lib/python3.12/site-packages/torch/nn/modules/conv.py:554, in Conv2d.forward(self, input)
   553 def forward(self, input: Tensor) -> Tensor:
--> 554     return self._conv_forward(input, self.weight, self.bias)

File ~/HW/DS-поток/.venv/lib/python3.12/site-packages/torch/nn/modules/conv.py:549, in Conv2d._conv_forward(self, input, weight, bias)
   537 if self.padding_mode != "zeros":
   538     return F.conv2d(
   539         F.pad(
   540             input, self._reversed_padding_repeated_twice, mode=self.padding_mode
  (...)    547         self.groups,
   548     )
--> 549 return F.conv2d(
   550     input, weight, bias, self.stride, self.padding, self.dilation, self.groups
   551 )

RuntimeError: could not create a memory

Can anyone help? Is it interesting for at least someone or everybody just use nvidia?(:

Hi Roman!

I have run pytorch using the “xpu” on an intel (“meteor lake”) laptop running ubuntu 24.04
(natively, not hosted on windows). The basics work more or less correctly, but there are
limitations.

From your stack trace, it looks like your program is failing at the first Conv2d in your model.

I’ve never seen the specific text “RuntimeError: could not create a memory” in an error
message, but is seems like it might be an out-of-memory error.

Several questions and comments:

Can you tell us specifically what “xpu” and version of pytorch you are using? E.g., something
like:

>>> import torch
>>> torch.__version__
'2.7.0+xpu'
>>> torch.xpu.get_device_properties()
_XpuDeviceProperties(name='Intel(R) Arc(TM) Graphics', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.33276+22', total_memory=29184MB, max_compute_units=128, gpu_eu_count=128, gpu_subslice_count=8, max_work_group_size=1024, max_num_sub_groups=128, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=1, has_atomic64=1)

Can you post a simplified (maybe just a single Conv2d) , fully-self-contained script that
reproduces your error, together with the output you get when you run it?

Would it be practical to test things with pytorch running natively on windows, just to avoid
any complication the WSL layer might be introducing?

My belief (I’m not that experienced with using the xpu) is that when running on the xpu, you
are using the regular system (cpu) ram. I’ve seen complaints that xpu out-of-memory error
messages might report the total system ram as being available even if some of that memory
is actually in use, but we didn’t get any definitive follow-up on what was actually going on.

Can you run the exact same model (or that in your hypothetical simplified script) successfully
on the cpu on the same system? If your problem is that most of your system ram is already
in use, you would expect to see similar out-of-memory errors both on the cpu and xpu.

Can you get pytorch to report any useful (and correct) xpu memory statistics for your system?

Note, when I try running torch.xpu.mem_get_info() I get:

Traceback (most recent call last):
  File "<python-input-19>", line 1, in <module>
    torch.xpu.mem_get_info()
    ~~~~~~~~~~~~~~~~~~~~~~^^
  File "<path_to_pytorch_install>/torch/xpu/memory.py", line 194, in mem_get_info
    return torch._C._xpu_getMemoryInfo(device)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
RuntimeError: The device (Intel(R) Arc(TM) Graphics) doesn't support querying the available free memory. You can file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize its implementation.

Right before you start running python / pytorch, what does your ubuntu System Monitor
show you for total memory and memory in use?

Best.

K. Frank