[Seeking Volunteers / Feedback] cuda-morph — student project, CUDA→Ascend NPU compatibility shim

Hi everyone,

I’m a CS student experimenting with a small open-source project called cuda-morph (github.com/JosephAhn23/cuda-morph) and I wanted to reach out to this community for feedback or volunteer help — though I want to be upfront about what this actually is before anyone’s time gets wasted.

What it does

cuda-morph is a thin Python compatibility layer that intercepts torch.cuda.* calls and reroutes them to torch.npu.* equivalents via Huawei’s torch_npu. The goal is that existing CUDA-oriented PyTorch scripts run on Ascend NPUs with minimal or no code changes. It also includes shims for HuggingFace Transformers, DeepSpeed, flash-attn, and vLLM.

What I want to be honest about

I don’t have access to Ascend hardware. I wrote the Ascend backend based entirely on public torch_npu documentation and the Gitee source. It compiles and passes 460+ tests in CPU-fallback mode, but it has never been run on a real NPU. I have no way to verify it works.

I suspect to people working in this space professionally this is not a groundbreaking idea — torch_npu already handles the hard parts at the C++/CANN level. This project sits one layer above that and tries to fix the “last mile” ecosystem compatibility issues (libraries that hardcode CUDA assumptions). It may already be solved internally, or simply not worth the effort. I genuinely don’t know.

What I’m looking for

If anyone here has access to Ascend 910B or 310P hardware and is willing to run a quick smoke test, that would be incredibly valuable. Even a “it crashed on import” is useful feedback. I’m also open to being told the approach is wrong or redundant — I’d rather know.

No pressure at all. This is a hobby/learning project, not a funded effort, and I’m aware I’m a solo student asking a professional community for help. Any guidance is appreciated.

GitHub: https://github.com/JosephAhn23/cuda-morph