XPU out of memory error with Intel Arc Graphics (Meteor Lake) despite sufficient system memory and reported XPU capacity

libchara-dev · May 6, 2025, 2:02pm

Issue Description:
I’m experiencing an XPU out of memory error when fine-tuning a LLM using Axolotl with PyTorch 2.7.0+xpu on a ThinkPad T14p Gen2 running Manjaro Linux.

PyTorch reports that the GPU has 28.56 GiB total capacity, but fails to allocate 7.09 GiB, even though system memory is plentiful and none is used by PyTorch at the time.

There is no option in the BIOS to modify integrated GPU memory allocation, and lspci only shows a 256MB prefetchable memory region, possibly related to the visible VRAM window. However, this does not match the memory PyTorch claims is available.

Hardware & Environment Details:

Device: ThinkPad T14p Gen2
CPU: Intel Core Ultra 5 125H
GPU: Intel Arc Graphics (Meteor Lake-P)
Driver: i915 (and xe?
Kernel: 6.12.25-1-MANJARO
PyTorch Version: 2.7.0+xpu
Axolotl Version: 0.10.0.dev0
System RAM: 30 GiB (No Swap)
Memory Free at Crash: 25 GiB

lspci -v | grep -A 8 VGA:

00:02.0 VGA compatible controller: Intel Corporation Meteor Lake-P [Intel Arc Graphics] (rev 08) (prog-if 00 [VGA controller])
  Subsystem: Lenovo Device 50e9
  Flags: bus master, fast devsel, latency 0, IRQ 157
  Memory at 4058000000 (64-bit, prefetchable) [size=16M]
  Memory at 4000000000 (64-bit, prefetchable) [size=256M]
  Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
  Capabilities: <access denied>
  Kernel driver in use: i915
  Kernel modules: i915, xe

Error Trace:

Loading checkpoint shards:   0%|                                                                                                                                                             | 0/4 [00:00<?, ?it/s][2025-05-06 18:36:50,956] [ERROR] [axolotl.utils.models.load_model:1273] [PID:6160] [RANK:0] XPU out of memory. Tried to allocate 7.09 GiB. GPU 0 has a total capacity of 28.56 GiB. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.
Traceback (most recent call last):
  File "/home/libchara/finetune_intel/axolotl/src/axolotl/utils/models.py", line 1270, in load_model
    skip_move_to_device = self.build_model(qlora_fsdp)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/libchara/finetune_intel/axolotl/src/axolotl/utils/models.py", line 1128, in build_model
    self.model = self.auto_model_loader.from_pretrained(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/libchara/finetune_intel/finetune/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/libchara/finetune_intel/finetune/lib/python3.12/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/libchara/finetune_intel/finetune/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4399, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/libchara/finetune_intel/finetune/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4793, in _load_pretrained_model
    caching_allocator_warmup(model_to_load, expanded_device_map, factor=2 if hf_quantizer is None else 4)
  File "/home/libchara/finetune_intel/finetune/lib/python3.12/site-packages/transformers/modeling_utils.py", line 5803, in caching_allocator_warmup
    _ = torch.empty(byte_count // factor, dtype=torch.float16, device=device, requires_grad=False)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: XPU out of memory. Tried to allocate 7.09 GiB. GPU 0 has a total capacity of 28.56 GiB. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.

Additional Notes:

BIOS has no configurable setting for iGPU memory or ReBAR.
Intel Arc Graphics is iGPU; there is no discrete GPU.
According to Intel tools and forums, iGPU use system memory via dynamic allocation (DVMT), but behavior here seems inconsistent with reported memory.

Expected Behavior:

If PyTorch reports 28 GiB available, it should not fail to allocate 7 GiB.
Or provide better diagnostics or warnings if host-side DVMT or BAR settings limit actual usable memory.

If need any additional details, please let me know.

KFrank · May 6, 2025, 6:30pm

Hi libchara!

I have a system similar to yours (meteor-lake thinkpad with 32 GB system ram) and have
worked with xpu tensors in excess of 10GB using, for example, 2.7.0+xpu (downloaded
as RC1).

I do believe you’re correct that the “xpu” uses system ram (i.e., cpu ram) for its memory.
It looks odd to me that if your system has 30 GB, pytorch would be reporting 28 GB
available. I would expect your os and especially your windowing system (which I assume
you run) take up several (i.e., more than two) GB before you even start pytorch.

Does your system have a typical linux “system monitor” that shows you how much memory
is already in use before you start python / pytorch?

You might try starting a fresh pytorch session and see how large a tensor you can create
directly on the xpu before you get an out-of-memory error.

(Also, although I would expect it to be smart enough not to do so, is it possible that your
model-loading scheme first loads – or decompresses – parts of the model into memory
under control of the cpu and then moves those tensors to memory under control of the
xpu? If so, you could have two copies of some of your model parameters stored twice
in system memory during the loading process.)

I don’t know how pytorch goes about determining the amount of memory available to it,
especially when using the xpu. Note that intel arc xpu support is a work in progress and
not fully baked, so I would suggest you test pytorch’s reported available memory by
trying to create some xpu tensors of known size.

Best.

K. Frank

libchara-dev · May 8, 2025, 5:18am

I wrote a little script for testing XPU memory allocation and now I realized that he fails to allocate 4GB every time.

[libchara@libchara-ThinkPad-T14p-Gen-2 ~]$ cd finetune_intel/
[libchara@libchara-ThinkPad-T14p-Gen-2 finetune_intel]$ source finetune/bin/activate                 
(finetune) [libchara@libchara-ThinkPad-T14p-Gen-2 finetune_intel]$ source /opt/intel/oneapi/setvars.sh 
 
:: initializing oneAPI environment ...
   bash: BASH_VERSION = 5.2.37(1)-release
   args: Using "$@" for setvars.sh arguments: 
:: ccl -- latest
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpl -- latest
:: mkl -- latest
:: mpi -- latest
:: pti -- latest
:: tbb -- latest
:: umf -- latest
:: oneAPI environment initialized ::
 
(finetune) [libchara@libchara-ThinkPad-T14p-Gen-2 finetune_intel]$ python3 xpu_mem_test.py 

=== XPU Memory Test ===
Device Name: Intel(R) Arc(TM) Graphics

Device Memory: 14.06GB total
Max allocated during session: 0.00GB

Attempting to allocate 1.0GB tensor...
Success! Current allocated: 1.00GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       4.8Gi        23Gi       460Mi       3.2Gi        26Gi
Swap:             0B          0B          0B

Attempting to allocate 1.1GB tensor...
Success! Current allocated: 1.10GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       4.9Gi        23Gi       460Mi       3.2Gi        25Gi
Swap:             0B          0B          0B

Attempting to allocate 1.2000000000000002GB tensor...
Success! Current allocated: 1.20GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       6.1Gi        22Gi       460Mi       3.2Gi        24Gi
Swap:             0B          0B          0B

Attempting to allocate 1.3000000000000003GB tensor...
Success! Current allocated: 1.30GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       6.1Gi        22Gi       460Mi       3.2Gi        24Gi
Swap:             0B          0B          0B

Attempting to allocate 1.4000000000000004GB tensor...
Success! Current allocated: 1.40GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       6.1Gi        22Gi       460Mi       3.2Gi        24Gi
Swap:             0B          0B          0B

Attempting to allocate 1.5000000000000004GB tensor...
Success! Current allocated: 1.50GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       6.1Gi        22Gi       460Mi       3.2Gi        24Gi
Swap:             0B          0B          0B

Attempting to allocate 1.6000000000000005GB tensor...
Success! Current allocated: 1.60GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       6.9Gi        21Gi       460Mi       3.2Gi        23Gi
Swap:             0B          0B          0B

Attempting to allocate 1.7000000000000006GB tensor...
Success! Current allocated: 1.70GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       6.9Gi        21Gi       460Mi       3.2Gi        23Gi
Swap:             0B          0B          0B

Attempting to allocate 1.8000000000000007GB tensor...
Success! Current allocated: 1.80GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       7.3Gi        21Gi       460Mi       3.2Gi        23Gi
Swap:             0B          0B          0B

Attempting to allocate 1.9000000000000008GB tensor...
Success! Current allocated: 1.90GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       7.3Gi        21Gi       460Mi       3.2Gi        23Gi
Swap:             0B          0B          0B

Attempting to allocate 2.000000000000001GB tensor...
Success! Current allocated: 2.00GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       7.3Gi        21Gi       460Mi       3.2Gi        23Gi
Swap:             0B          0B          0B

Attempting to allocate 2.100000000000001GB tensor...
Success! Current allocated: 2.10GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       7.3Gi        21Gi       460Mi       3.2Gi        23Gi
Swap:             0B          0B          0B

Attempting to allocate 2.200000000000001GB tensor...
Success! Current allocated: 2.20GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       7.3Gi        21Gi       460Mi       3.2Gi        23Gi
Swap:             0B          0B          0B

Attempting to allocate 2.300000000000001GB tensor...
Success! Current allocated: 2.30GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       8.3Gi        20Gi       460Mi       3.2Gi        22Gi
Swap:             0B          0B          0B

Attempting to allocate 2.4000000000000012GB tensor...
Success! Current allocated: 2.40GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       8.3Gi        20Gi       460Mi       3.2Gi        22Gi
Swap:             0B          0B          0B

Attempting to allocate 2.5000000000000013GB tensor...
Success! Current allocated: 2.50GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       8.3Gi        20Gi       460Mi       3.2Gi        22Gi
Swap:             0B          0B          0B

Attempting to allocate 2.6000000000000014GB tensor...
Success! Current allocated: 2.60GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       8.3Gi        20Gi       460Mi       3.2Gi        22Gi
Swap:             0B          0B          0B

Attempting to allocate 2.7000000000000015GB tensor...
Success! Current allocated: 2.70GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.1Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 2.8000000000000016GB tensor...
Success! Current allocated: 2.80GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.3Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 2.9000000000000017GB tensor...
Success! Current allocated: 2.90GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.3Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 3.0000000000000018GB tensor...
Success! Current allocated: 3.00GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.3Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 3.100000000000002GB tensor...
Success! Current allocated: 3.10GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.3Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 3.200000000000002GB tensor...
Success! Current allocated: 3.20GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.3Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 3.300000000000002GB tensor...
Success! Current allocated: 3.30GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.3Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 3.400000000000002GB tensor...
Success! Current allocated: 3.40GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.3Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 3.500000000000002GB tensor...
Success! Current allocated: 3.50GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.4Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 3.6000000000000023GB tensor...
Success! Current allocated: 3.60GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.4Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 3.7000000000000024GB tensor...
Success! Current allocated: 3.70GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.4Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 3.8000000000000025GB tensor...
Success! Current allocated: 3.80GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.4Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 3.9000000000000026GB tensor...
Success! Current allocated: 3.90GB
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.4Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

Attempting to allocate 4.000000000000003GB tensor...

Allocation failed at 4.000000000000003GB (last success: 3.9000000000000026GB)
Error message: XPU out of memory. Tried to allocate 4.00 GiB. GPU 0 has a total capacity of 14.06 GiB. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.
               total        used        free      shared  buff/cache   available
Mem:            30Gi       9.4Gi        19Gi       460Mi       3.2Gi        21Gi
Swap:             0B          0B          0B

! Final allocated memory: 0.00GB
Test completed.
(finetune) [libchara@libchara-ThinkPad-T14p-Gen-2 finetune_intel]$

Here’s the script:

import torch
from torch import xpu
import os

def xpu_memory_test():
    if not xpu.is_available():
        print("XPU not available!")
        return
    
    device = torch.device("xpu")
    print(f"\n=== XPU Memory Test ===")
    
    try:
        print(f"Device Name: {torch.xpu.get_device_name(device)}")

        max_alloc = torch.xpu.max_memory_allocated(device) / (1024**3)
        total_mem = torch.xpu.get_device_properties(device).total_memory / (1024**3)
        
        print(f"\nDevice Memory: {total_mem:.2f}GB total")
        print(f"Max allocated during session: {max_alloc:.2f}GB")

        size_step = 0.1
        current_size = 1.0
        last_success = 0
        
        while current_size <= total_mem:
            tensor_size = int(current_size * (1024**3 / 4))
            print(f"\nAttempting to allocate {current_size}GB tensor...")
            
            try:
                test_tensor = torch.empty(tensor_size, dtype=torch.float32, device=device)
                torch.xpu.synchronize(device)

                allocated = torch.xpu.memory_allocated(device) / (1024**3)
                print(f"Success! Current allocated: {allocated:.2f}GB")
                os.system("free -h")
                
                del test_tensor
                torch.xpu.empty_cache()
                last_success = current_size
                current_size += size_step
                
            except RuntimeError as e:
                print(f"\nAllocation failed at {current_size}GB (last success: {last_success}GB)")
                print(f"Error message: {str(e)}")
                os.system("free -h")
                break
                
    except Exception as e:
        print(f"\nError during memory test: {str(e)}")
        
    finally:
        allocated = torch.xpu.memory_allocated(device) / (1024**3)
        print(f"\n! Final allocated memory: {allocated:.2f}GB")
        print("Test completed.")

if __name__ == "__main__":
    xpu_memory_test()
    torch.xpu.empty_cache()

Andrey_Neyvanov · May 24, 2025, 11:34pm

I’ve got the same issue on my Laptop Asus Vivobook S 15
CPU: Intel Core Ultra 7 155H
RAM: 32 GB
GPU: Intel Arc Graphics
OS: Windows 11

Allocation failed at 4.000000000000003GB (last success: 3.9000000000000026GB)
Error message: XPU out of memory. Tried to allocate 4.00 GiB. GPU 0 has a total capacity of 16.44 GiB. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. Please use empty_cache to release all unoccupied cached memory.
‘free’ is not recognized as an internal or external command,
operable program or batch file.

! Final allocated memory: 0.00GB
Test completed.

theSharque · July 7, 2025, 6:39am

Same here (WAN-Video request memory)
Any mem sized > 4Gb. AFAIK we can eleborate this by setting environment variable during compile.

2025-07-07 09:20:17,600 - WanVideoKsampler - ERROR - Error during processing: XPU out of memory. Tried to allocate 4.28 GiB. GPU 0 has a total capacity of 28.49 GiB. Of the allocated memory 9.32 GiB is allocated by PyTorch, and 1.02 GiB is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.
Error during processing: XPU out of memory. Tried to allocate 4.28 GiB. GPU 0 has a total capacity of 28.49 GiB. Of the allocated memory 9.32 GiB is allocated by PyTorch, and 1.02 GiB is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.
/home/sharque/sd/matrix/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/__init__.py:1135: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
  return isinstance(obj, torch.Tensor)
!!! Exception during processing !!! XPU out of memory. Tried to allocate 4.28 GiB. GPU 0 has a total capacity of 28.49 GiB. Of the allocated memory 9.32 GiB is allocated by PyTorch, and 1.02 GiB is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.
Traceback (most recent call last):
  File "/home/sharque/sd/matrix/Packages/ComfyUI/execution.py", line 361, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/execution.py", line 236, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/execution.py", line 208, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/execution.py", line 197, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/home/sharque/sd/matrix/Packages/ComfyUI/custom_nodes/ComfyUI-WanVideoKsampler/nodes.py", line 235, in sample
    raise e
  File "/home/sharque/sd/matrix/Packages/ComfyUI/custom_nodes/ComfyUI-WanVideoKsampler/nodes.py", line 210, in sample
    result = nodes.common_ksampler(
  File "/home/sharque/sd/matrix/Packages/ComfyUI/nodes.py", line 1483, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/sample.py", line 45, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 1143, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 1033, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 1018, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/patcher_extension.py", line 111, in execute
    return self.original(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 986, in outer_sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 969, in inner_sample
    samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/patcher_extension.py", line 111, in execute
    return self.original(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 748, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/k_diffusion/sampling.py", line 934, in sample_dpmpp_2m_sde_gpu
    return sample_dpmpp_2m_sde(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, eta=eta, s_noise=s_noise, noise_sampler=noise_sampler, solver_type=solver_type)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/k_diffusion/sampling.py", line 824, in sample_dpmpp_2m_sde
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 400, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 949, in __call__
    return self.predict_noise(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 952, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 380, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 206, in calc_cond_batch
    return executor.execute(model, conds, x_in, timestep, model_options)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/patcher_extension.py", line 111, in execute
    return self.original(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/samplers.py", line 325, in _calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/model_base.py", line 152, in apply_model
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/patcher_extension.py", line 111, in execute
    return self.original(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/model_base.py", line 190, in _apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
  File "/home/sharque/sd/matrix/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/ldm/wan/model.py", line 563, in forward
    return self.forward_orig(x, timestep, context, clip_fea=clip_fea, freqs=freqs, transformer_options=transformer_options, **kwargs)[:, :, :t, :h, :w]
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/ldm/wan/model.py", line 533, in forward_orig
    x = block(x, e=e0, freqs=freqs, context=context, context_img_len=context_img_len)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/ldm/wan/model.py", line 209, in forward
    y = self.self_attn(
  File "/home/sharque/sd/matrix/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/ldm/wan/model.py", line 72, in forward
    x = optimized_attention(
  File "/home/sharque/sd/matrix/Packages/ComfyUI/comfy/ldm/modules/attention.py", line 451, in attention_pytorch
    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
torch.OutOfMemoryError: XPU out of memory. Tried to allocate 4.28 GiB. GPU 0 has a total capacity of 28.49 GiB. Of the allocated memory 9.32 GiB is allocated by PyTorch, and 1.02 GiB is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.

Got an OOM, unloading all loaded models.