I’ve been struggling to solve computer hard shutdown issue. When I run any application that calls PyTorch and Cuda, as the main LLM is about to load into vram, my computer just shuts off. It is as if is some overload protection is triggered and it just hard crashes, but motherboard has power i.e. network adapter still has power, but needs a power pull and plug in to reboot.
All hardware and software is up to date and full stress tested. 64G or EEC ram, NVIDIA A5000 24 gig GPU.
Example on ComfyUI, all the Unet, LoRA, VAE all loads. When it -----Start infer----- it may run for a few seconds, then crash. It happens so fast, nothing can write to any type of logs I have tried to grather.
This only happens when related to PyTorch type of operations. I can run 36 hours of AEC type of renders, etc on the machine and never once a crash.
I had 2.6&cu12.6, now 2.7&cu12.8 = same crash.
I have tried all sizes of checkpoints, floats, FB’s, models, etc as well, even ones that use low-vram down to 12g vram… still the same.
Last 3 lines of last crash:
[2025-04-29 07:41:35.734] Pipelines loaded with dtype=torch.float16
cannot run with cpu
device. It is not recommended to move them to cpu
as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16
operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16
argument, or use another device for inference.
[2025-04-29 07:41:35.735] Requested to load AutoencodingEngine
[2025-04-29 07:41:35.900] loaded completely 20147.426822662354 186.42292404174805 True
Any assistance would be highly appreciated.