Issue Description:
I’m experiencing an XPU out of memory
error when fine-tuning a LLM using Axolotl with PyTorch 2.7.0+xpu on a ThinkPad T14p Gen2 running Manjaro Linux.
PyTorch reports that the GPU has 28.56 GiB total capacity, but fails to allocate 7.09 GiB, even though system memory is plentiful and none is used by PyTorch at the time.
There is no option in the BIOS to modify integrated GPU memory allocation, and lspci
only shows a 256MB prefetchable memory region, possibly related to the visible VRAM window. However, this does not match the memory PyTorch claims is available.
Hardware & Environment Details:
-
Device: ThinkPad T14p Gen2
-
CPU: Intel Core Ultra 5 125H
-
GPU: Intel Arc Graphics (Meteor Lake-P)
-
Driver: i915 (and xe?
-
Kernel: 6.12.25-1-MANJARO
-
PyTorch Version:
2.7.0+xpu
-
Axolotl Version: 0.10.0.dev0
-
System RAM: 30 GiB (No Swap)
-
Memory Free at Crash: 25 GiB
-
lspci -v | grep -A 8 VGA
:00:02.0 VGA compatible controller: Intel Corporation Meteor Lake-P [Intel Arc Graphics] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device 50e9 Flags: bus master, fast devsel, latency 0, IRQ 157 Memory at 4058000000 (64-bit, prefetchable) [size=16M] Memory at 4000000000 (64-bit, prefetchable) [size=256M] Expansion ROM at 000c0000 [virtual] [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: i915 Kernel modules: i915, xe
Error Trace:
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s][2025-05-06 18:36:50,956] [ERROR] [axolotl.utils.models.load_model:1273] [PID:6160] [RANK:0] XPU out of memory. Tried to allocate 7.09 GiB. GPU 0 has a total capacity of 28.56 GiB. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.
Traceback (most recent call last):
File "/home/libchara/finetune_intel/axolotl/src/axolotl/utils/models.py", line 1270, in load_model
skip_move_to_device = self.build_model(qlora_fsdp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/libchara/finetune_intel/axolotl/src/axolotl/utils/models.py", line 1128, in build_model
self.model = self.auto_model_loader.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/libchara/finetune_intel/finetune/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/libchara/finetune_intel/finetune/lib/python3.12/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/libchara/finetune_intel/finetune/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4399, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/libchara/finetune_intel/finetune/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4793, in _load_pretrained_model
caching_allocator_warmup(model_to_load, expanded_device_map, factor=2 if hf_quantizer is None else 4)
File "/home/libchara/finetune_intel/finetune/lib/python3.12/site-packages/transformers/modeling_utils.py", line 5803, in caching_allocator_warmup
_ = torch.empty(byte_count // factor, dtype=torch.float16, device=device, requires_grad=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: XPU out of memory. Tried to allocate 7.09 GiB. GPU 0 has a total capacity of 28.56 GiB. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.
Additional Notes:
- BIOS has no configurable setting for iGPU memory or ReBAR.
- Intel Arc Graphics is iGPU; there is no discrete GPU.
- According to Intel tools and forums, iGPU use system memory via dynamic allocation (DVMT), but behavior here seems inconsistent with reported memory.
Expected Behavior:
- If PyTorch reports 28 GiB available, it should not fail to allocate 7 GiB.
- Or provide better diagnostics or warnings if host-side DVMT or BAR settings limit actual usable memory.
If need any additional details, please let me know.