Comfy_UI:Attempting to use hipBLASLt on a unsupported architecture!

sersys · January 23, 2025, 3:06pm

Hello,

I’m trying to run Comfy_UI on my RX 7900 TX. I tried installing Rocm and the nightly build of pytorch as per the pytorch installation guide on the official website in a Fedora distrobox, and I also tried using the pre-built ubuntu+rocm+pytorch docker image from AMD’s site (PyTorch on ROCm — ROCm installation (Linux)). Both cases Comfy_UI starts up as normal but as soon as I queue an operation, its shuts down with the error “Attempting to use hipBLASLt on a unsupported architecture!”
From what I’ve read this issue on gfx1100 cards was supposed to be fixed in pytorch 2.5.1 somewhere around Oct 2024. Just to see if it was Rocm that was causing the problem, I updated the the the AMD pre-built container to Rocm-6.3 from 6.2.4 but nothing changed. I don’t know what should be my next step troubleshooting, especially since the AMD assembled and tested container does not work either.
Please Help.

Host system:

Operating System: Debian GNU/Linux 12
KDE Plasma Version: 6.2.5
KDE Frameworks Version: 6.10.0
Qt Version: 6.7.2
Kernel Version: 6.12.9-amd64 (64-bit)
Graphics Platform: Wayland
Processors: 24 × AMD Ryzen 9 7900X 12-Core Processor
Memory: 61.9 GiB of RAM
Graphics Processor: AMD Radeon RX 7900 XT
Manufacturer: ASUS

Confy_ui output on the AMD pre-buit container

Total VRAM 20464 MB, total RAM 63432 MB
pytorch version: 2.6.0.dev20241122+rocm6.2
Set vram state to: NORMAL_VRAM
Device: cuda:0 Radeon RX 7900 XT : native
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
[Prompt Server] web root: /home/sersys/pyproj/ComfyUI-0.3.12/web

Import times for custom nodes:
   0.0 seconds: /home/sersys/pyproj/ComfyUI-0.3.12/custom_nodes/websocket_image_save.py

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model weight dtype torch.float16, manual cast: None
model_type EPS
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load SDXLClipModel
loaded completely 9.5367431640625e+25 1560.802734375 True
/home/sersys/pyproj/ComfyUI-0.3.12/comfy/ops.py:64: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:296.)
  return torch.nn.functional.linear(input, weight, bias)
Requested to load SDXL
loaded completely 9.5367431640625e+25 4897.0483474731445 True
  0%|                                                                | 0/20 [00:00<?, ?it/s]:0:rocdevice.cpp            :2984: 93862815501 us: [pid:318562 tid:0x7f122e5ff640] Callback: Queue 0x7f0ed8000000 aborting with error : HSA_STATUS_ERROR_OUT_OF_REGISTERS: Kernel has requested more VGPRs than are available on this agent code: 0x2d
Aborted (core dumped)

jaglinux · February 4, 2025, 8:50am

The Pytorch used is 2.6.0.dev20241122+rocm6.2
Can you plz take the latest Pytorch rocm nightly , the issue should have been fixed.

sersys · February 4, 2025, 10:18pm

Hello,
Thank you for the suggestion. Apparently I’ve forgot to include the output of my other, Fedora distrobox in my previous post. This is where I’ve manually installed Rocm and Pytorch separately. I have indeed tried the latest version but just for sanity check I ran the instructions on the Pytorch installation guide page:

$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://download.pytorch.org/whl/nightly/rocm6.3
Requirement already satisfied: torch in ./.local/lib/python3.13/site-packages (2.7.0.dev20250110+rocm6.3)
Requirement already satisfied: torchvision in ./.local/lib/python3.13/site-packages (0.22.0.dev20250110+rocm6.3)
Requirement already satisfied: torchaudio in ./.local/lib/python3.13/site-packages (2.6.0.dev20250110+rocm6.3)
Requirement already satisfied: filelock in ./.local/lib/python3.13/site-packages (from torch) (3.16.1)
Requirement already satisfied: typing-extensions>=4.10.0 in ./.local/lib/python3.13/site-packages (from torch) (4.12.2)
Requirement already satisfied: setuptools in ./.local/lib/python3.13/site-packages (from torch) (72.1.0)
Requirement already satisfied: sympy==1.13.1 in ./.local/lib/python3.13/site-packages (from torch) (1.13.1)
Requirement already satisfied: networkx in ./.local/lib/python3.13/site-packages (from torch) (3.4.2)
Requirement already satisfied: jinja2 in ./.local/lib/python3.13/site-packages (from torch) (3.1.4)
Requirement already satisfied: fsspec in ./.local/lib/python3.13/site-packages (from torch) (2024.10.0)
Requirement already satisfied: pytorch-triton-rocm==3.2.0+git0d4682f0 in ./.local/lib/python3.13/site-packages (from torch) (3.2.0+git0d4682f0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in ./.local/lib/python3.13/site-packages (from sympy==1.13.1->torch) (1.3.0)
Requirement already satisfied: numpy in ./.local/lib/python3.13/site-packages (from torchvision) (2.1.2)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in ./.local/lib/python3.13/site-packages (from torchvision) (11.0.0)
Requirement already satisfied: MarkupSafe>=2.0 in ./.local/lib/python3.13/site-packages (from jinja2->torch) (2.1.5)

This is the output of comfy_ui:

Checkpoint files will always be loaded safely.
Total VRAM 20464 MB, total RAM 63431 MB
pytorch version: 2.7.0.dev20250110+rocm6.3
Set vram state to: NORMAL_VRAM
Device: cuda:0 Radeon RX 7900 XT : native
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
[Prompt Server] web root: /home/sersys/pyproj/ComfyUI-0.3.12/web

Import times for custom nodes:
   0.0 seconds: /home/sersys/pyproj/ComfyUI-0.3.12/custom_nodes/websocket_image_save.py

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model weight dtype torch.float16, manual cast: None
model_type EPS
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
Requested to load SDXLClipModel
loaded completely 9.5367431640625e+25 1560.802734375 True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
/home/sersys/.local/lib/python3.13/site-packages/torch/functional.py:407: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:328.)
  return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
Requested to load SDXL
loaded completely 9.5367431640625e+25 4897.0483474731445 True
  0%|                                                                                                                           | 0/20 [00:00<?, ?it/s]:0:rocdevice.cpp            :3020: 134126665871d us:  Callback: Queue 0x7ff2c0100000 aborting with error : HSA_STATUS_ERROR_OUT_OF_REGISTERS: Kernel has requested more VGPRs than are available on this agent code: 0x2d
Aborted (core dumped)

I’ve also checked the AMD provided pre-built Ubuntu container that I’ve, by now updated, trying to fix the issue:

$pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://download.pytorch.org/whl/nightly/rocm6.3
Requirement already satisfied: torch in ./.local/lib/python3.10/site-packages (2.6.0)
Requirement already satisfied: torchvision in ./.local/lib/python3.10/site-packages (0.21.0)
Requirement already satisfied: torchaudio in ./.local/lib/python3.10/site-packages (2.6.0)
Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in ./.local/lib/python3.10/site-packages (from torch) (12.4.127)
Requirement already satisfied: sympy==1.13.1 in ./.local/lib/python3.10/site-packages (from torch) (1.13.1)
Requirement already satisfied: typing-extensions>=4.10.0 in ./.local/lib/python3.10/site-packages (from torch) (4.12.2)
Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in ./.local/lib/python3.10/site-packages (from torch) (11.2.1.3)
Requirement already satisfied: networkx in ./.local/lib/python3.10/site-packages (from torch) (3.4.2)
Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in ./.local/lib/python3.10/site-packages (from torch) (2.21.5)
Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in ./.local/lib/python3.10/site-packages (from torch) (12.3.1.170)
Requirement already satisfied: fsspec in ./.local/lib/python3.10/site-packages (from torch) (2024.10.0)
Requirement already satisfied: jinja2 in ./.local/lib/python3.10/site-packages (from torch) (3.1.4)
Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in ./.local/lib/python3.10/site-packages (from torch) (12.4.5.8)
Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in ./.local/lib/python3.10/site-packages (from torch) (0.6.2)
Requirement already satisfied: filelock in ./.local/lib/python3.10/site-packages (from torch) (3.16.1)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in ./.local/lib/python3.10/site-packages (from torch) (12.4.127)
Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in ./.local/lib/python3.10/site-packages (from torch) (10.3.5.147)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in ./.local/lib/python3.10/site-packages (from torch) (12.4.127)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in ./.local/lib/python3.10/site-packages (from torch) (12.4.127)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in ./.local/lib/python3.10/site-packages (from torch) (12.4.127)
Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in ./.local/lib/python3.10/site-packages (from torch) (9.1.0.70)
Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in ./.local/lib/python3.10/site-packages (from torch) (11.6.1.9)
Requirement already satisfied: triton==3.2.0 in ./.local/lib/python3.10/site-packages (from torch) (3.2.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in ./.local/lib/python3.10/site-packages (from sympy==1.13.1->torch) (1.3.0)
Requirement already satisfied: numpy in ./.local/lib/python3.10/site-packages (from torchvision) (2.1.2)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in ./.local/lib/python3.10/site-packages (from torchvision) (11.0.0)
Requirement already satisfied: MarkupSafe>=2.0 in ./.local/lib/python3.10/site-packages (from jinja2->torch) (2.1.5)

Previously I got the same comfi_ui output as above but now it’s different. Probably restarting the distrobox did something.
However now it’s looking for nvidia drivers for some reason?

Checkpoint files will always be loaded safely.
Traceback (most recent call last):
  File "/home/sersys/pyproj/ComfyUI-0.3.12/main.py", line 136, in <module>
    import execution
  File "/home/sersys/pyproj/ComfyUI-0.3.12/execution.py", line 13, in <module>
    import nodes
  File "/home/sersys/pyproj/ComfyUI-0.3.12/nodes.py", line 22, in <module>
    import comfy.diffusers_load
  File "/home/sersys/pyproj/ComfyUI-0.3.12/comfy/diffusers_load.py", line 3, in <module>
    import comfy.sd
  File "/home/sersys/pyproj/ComfyUI-0.3.12/comfy/sd.py", line 6, in <module>
    from comfy import model_management
  File "/home/sersys/pyproj/ComfyUI-0.3.12/comfy/model_management.py", line 166, in <module>
    total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
  File "/home/sersys/pyproj/ComfyUI-0.3.12/comfy/model_management.py", line 129, in get_torch_device
    return torch.device(torch.cuda.current_device())
  File "/home/sersys/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 971, in current_device
    _lazy_init()
  File "/home/sersys/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 319, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

sersys · February 7, 2025, 12:12am

This seems to be known bug and ironically the only solution is to downgrade PyTorch in which case I’m facing the hipBLASLt issue again.

sersys · February 7, 2025, 4:33am

I can no longer install pytorch.

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://download.pytorch.org/whl/nightly/rocm6.3
Collecting torch
  Downloading https://download.pytorch.org/whl/nightly/rocm6.3/torch-2.7.0.dev20250206%2Brocm6.3-cp310-cp310-manylinux_2_28_x86_64.whl (4323.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 4.3/4.3 GB 35.6 MB/s eta 0:00:01
ERROR: Exception:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/base_command.py", line 165, in exc_logging_wrapper
    status = run_func(*args)
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/req_command.py", line 205, in wrapper
    return func(self, options, args)
  File "/usr/lib/python3/dist-packages/pip/_internal/commands/install.py", line 339, in run
    requirement_set = resolver.resolve(
  File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/resolver.py", line 94, in resolve
    result = self._result = resolver.resolve(
  File "/usr/lib/python3/dist-packages/pip/_vendor/resolvelib/resolvers.py", line 481, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/usr/lib/python3/dist-packages/pip/_vendor/resolvelib/resolvers.py", line 348, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "/usr/lib/python3/dist-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
    if not criterion.candidates:
  File "/usr/lib/python3/dist-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
    return bool(self._sequence)
  File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
    return any(self)
  File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
  File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
    candidate = func()
  File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/factory.py", line 215, in _make_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
  File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/candidates.py", line 288, in __init__
    super().__init__(
  File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/candidates.py", line 158, in __init__
    self.dist = self._prepare()
  File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
    dist = self._prepare_distribution()
  File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/candidates.py", line 299, in _prepare_distribution
    return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
  File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 487, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
  File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 532, in _prepare_linked_requirement
    local_file = unpack_url(
  File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 214, in unpack_url
    file = get_http_url(
  File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 94, in get_http_url
    from_path, content_type = download(link, temp_dir.path)
  File "/usr/lib/python3/dist-packages/pip/_internal/network/download.py", line 146, in __call__
    for chunk in chunks:
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/progress_bars.py", line 304, in _rich_progress_bar
    for chunk in iterable:
  File "/usr/lib/python3/dist-packages/pip/_internal/network/utils.py", line 63, in response_chunks
    for chunk in response.raw.stream(
  File "/usr/lib/python3/dist-packages/pip/_vendor/urllib3/response.py", line 576, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/lib/python3/dist-packages/pip/_vendor/urllib3/response.py", line 519, in read
    data = self._fp.read(amt) if not fp_closed else b""
  File "/usr/lib/python3/dist-packages/pip/_vendor/cachecontrol/filewrapper.py", line 96, in read
    self._close()
  File "/usr/lib/python3/dist-packages/pip/_vendor/cachecontrol/filewrapper.py", line 76, in _close
    self.__callback(result)
  File "/usr/lib/python3/dist-packages/pip/_vendor/cachecontrol/controller.py", line 331, in cache_response
    self.serializer.dumps(request, response, body),
  File "/usr/lib/python3/dist-packages/pip/_vendor/cachecontrol/serialize.py", line 70, in dumps
    return b",".join([b"cc=4", msgpack.dumps(data, use_bin_type=True)])
  File "/usr/lib/python3/dist-packages/pip/_vendor/msgpack/__init__.py", line 35, in packb
    return Packer(**kwargs).pack(o)
  File "/usr/lib/python3/dist-packages/pip/_vendor/msgpack/fallback.py", line 885, in pack
    self._pack(obj)
  File "/usr/lib/python3/dist-packages/pip/_vendor/msgpack/fallback.py", line 864, in _pack
    return self._pack_map_pairs(
  File "/usr/lib/python3/dist-packages/pip/_vendor/msgpack/fallback.py", line 970, in _pack_map_pairs
    self._pack(v, nest_limit - 1)
  File "/usr/lib/python3/dist-packages/pip/_vendor/msgpack/fallback.py", line 864, in _pack
    return self._pack_map_pairs(
  File "/usr/lib/python3/dist-packages/pip/_vendor/msgpack/fallback.py", line 970, in _pack_map_pairs
    self._pack(v, nest_limit - 1)
  File "/usr/lib/python3/dist-packages/pip/_vendor/msgpack/fallback.py", line 821, in _pack
    raise ValueError("Memoryview is too large")
ValueError: Memoryview is too large

sersys · February 8, 2025, 6:02am

I’ve finally managed to install the latest Rocm and PyTorch versions and got it running:

$ apt show rocm-libs -a
Package: rocm-libs
Version: 6.3.2.60302-66~24.04
Priority: optional
Section: devel
Maintainer: ROCm Dev Support <rocm-dev.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 2.3.0.60302-66~24.04), hipblaslt (= 0.10.0.60302-66~24.04), hipfft (= 1.0.17.60302-66~24.04), hipsolver (= 2.3.0.60302-66~24.04), hipsparse (= 3.1.2.60302-66~24.04), hiptensor (= 1.4.0.60302-66~24.04), miopen-hip (= 3.3.0.60302-66~24.04), half (= 1.12.0.60302-66~24.04), rccl (= 2.21.5.60302-66~24.04), rocalution (= 3.2.1.60302-66~24.04), rocblas (= 4.3.0.60302-66~24.04), rocfft (= 1.0.31.60302-66~24.04), rocrand (= 3.2.0.60302-66~24.04), hiprand (= 2.11.1.60302-66~24.04), rocsolver (= 3.27.0.60302-66~24.04), rocsparse (= 3.3.0.60302-66~24.04), rocm-core (= 6.3.2.60302-66~24.04), hipsparselt (= 0.2.2.60302-66~24.04), composablekernel-dev (= 1.1.0.60302-66~24.04), hipblas-dev (= 2.3.0.60302-66~24.04), hipblaslt-dev (= 0.10.0.60302-66~24.04), hipcub-dev (= 3.3.0.60302-66~24.04), hipfft-dev (= 1.0.17.60302-66~24.04), hipsolver-dev (= 2.3.0.60302-66~24.04), hipsparse-dev (= 3.1.2.60302-66~24.04), hiptensor-dev (= 1.4.0.60302-66~24.04), miopen-hip-dev (= 3.3.0.60302-66~24.04), rccl-dev (= 2.21.5.60302-66~24.04), rocalution-dev (= 3.2.1.60302-66~24.04), rocblas-dev (= 4.3.0.60302-66~24.04), rocfft-dev (= 1.0.31.60302-66~24.04), rocprim-dev (= 3.3.0.60302-66~24.04), rocrand-dev (= 3.2.0.60302-66~24.04), hiprand-dev (= 2.11.1.60302-66~24.04), rocsolver-dev (= 3.27.0.60302-66~24.04), rocsparse-dev (= 3.3.0.60302-66~24.04), rocthrust-dev (= 3.3.0.60302-66~24.04), rocwmma-dev (= 1.6.0.60302-66~24.04), hipsparselt-dev (= 0.2.2.60302-66~24.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1058 B
APT-Manual-Installed: yes
APT-Sources: http://repo.radeon.com/rocm/apt/6.3.2 noble/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

and

pip show torch
Name: torch
Version: 2.7.0.dev20250206+rocm6.3
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3-Clause
Location: /home/sersys/pyproj/rocmtorch1/lib/python3.12/site-packages
Requires: filelock, fsspec, jinja2, networkx, pytorch-triton-rocm, setuptools, sympy, typing-extensions
Required-by: kornia, spandrel, torchaudio, torchsde, torchvision

but I’m still getting the hipBLASLt issue

Checkpoint files will always be loaded safely.
Total VRAM 20464 MB, total RAM 63431 MB
pytorch version: 2.7.0.dev20250206+rocm6.3
Set vram state to: NORMAL_VRAM
Device: cuda:0 Radeon RX 7900 XT : native
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
ComfyUI version: 0.3.14
[Prompt Server] web root: /home/sersys/pyproj/ComfyUI-0.3.14/web

Import times for custom nodes:
   0.0 seconds: /home/sersys/pyproj/ComfyUI-0.3.14/custom_nodes/websocket_image_save.py

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model weight dtype torch.float16, manual cast: None
model_type EPS
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load SD1ClipModel
loaded completely 7118.8 235.84423828125 True
/home/sersys/pyproj/rocmtorch1/lib/python3.12/site-packages/torch/functional.py:408: UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:328.)
  return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
Requested to load BaseModel
loaded completely 6595.45478515625 1639.406135559082 True
  0%|                                                                                                                | 0/20 [00:00<?, ?it/s]:0:rocdevice.cpp            :3020: 18822005164d us:  Callback: Queue 0x7fdb5c400000 aborting with error : HSA_STATUS_ERROR_OUT_OF_REGISTERS: Kernel has requested more VGPRs than are available on this agent code: 0x2d
Aborted (core dumped)

jaglinux · February 8, 2025, 8:35am

There was a similar signature coming from [ROCm] Fix ADDMM hipBLASLt regression by naromero77amd · Pull Request #138267 · pytorch/pytorch · GitHub which got fixed. That was fixed in the PR, its a TORCH_CHECK which can lead to run time error.
However in the above stack, hipBLASLt error is coming from a different file. pytorch/aten/src/ATen/Context.cpp at main · pytorch/pytorch · GitHub and its TORCH_WARN_ONCE which should not trigger segfault.
Can you check the core dumps and confirm the backtrace.
I see this error “rocdevice.cpp :3020: 18822005164d us: Callback: Queue 0x7fdb5c400000 aborting with error : HSA_STATUS_ERROR_OUT_OF_REGISTERS: Kernel has requested more VGPRs than are available on this agent code: 0x2d”. Might not be Pytorch issue.
Also can you plz provide the comyfi ui command to repro the bug ?

sersys · February 9, 2025, 4:24pm

How do I “confirm backtrace”?
I’ve managed to get the core dumps to generate but I don’t really know what to do with it. I tried to use gdb but I’m really lost especially since the python environment and the Rocm installation are inside a distrobox and I get multiple messages that files and directories are either not found or could not be opened. Also the bt command seems to backtrack Python3 instead of main.py. I’m not even sure gdb is the right tool to use either.

How do I get the commands out of comfy_ui?
I don’t know how to use the cli version of comfy_ui and I couldn’t find any guide on how to print the command that was used by the gui. If it helps, I’m using the example workflow that comes with comfi_ui with some checkpoints I’ve downloaded. I’ve tried using different checkpoints to see if that’s the problem but the result is always the same error.

sersys · February 10, 2025, 4:15pm

I’ve managed to run gbd without errors apart from that failed download and I think it’s giving me the right information now. I don’t know if any of this is useful but it gave me this:

[Thread debugging using libthread_db enabled]                                                                                               
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 /home/sersys/pyproj/ComfyUI-0.3.14/main.py'.
Program terminated with signal SIGABRT, Aborted.
Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/pthread_kill.c.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44

warning: 44     ./nptl/pthread_kill.c: No such file or directory
[Current thread is 1 (Thread 0x7f7d543ff6c0 (LWP 556574))]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007f813841427e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007f81383f78ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007f80eacd698c in amd::roc::callbackQueue(hsa_status_t, hsa_queue_s*, void*) ()
   from /home/sersys/pyproj/rocmtorch1/lib/python3.12/site-packages/torch/lib/libamdhip64.so
#6  0x00007f8079e4e7b7 in bool rocr::AMD::AqlQueue::DynamicQueueEventsHandler<true>(long, void*) ()
   from /home/sersys/pyproj/rocmtorch1/lib/python3.12/site-packages/torch/lib/libhsa-runtime64.so
#7  0x00007f8079e793b1 in rocr::core::Runtime::AsyncEventsLoop(void*) ()
   from /home/sersys/pyproj/rocmtorch1/lib/python3.12/site-packages/torch/lib/libhsa-runtime64.so
#8  0x00007f8079e2ae77 in rocr::os::ThreadTrampoline(void*) ()
   from /home/sersys/pyproj/rocmtorch1/lib/python3.12/site-packages/torch/lib/libhsa-runtime64.so
#9  0x00007f813846baa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#10 0x00007f81384f8c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78