RISC-V Architecture Support Roadmap for PyTorch

Summary

We are the XuanTie team from Alibaba DAMO Academy, and we would like to propose a roadmap for enabling native RISC-V architecture support in PyTorch. This RFC aims to:

  1. Express our commitment to contributing RISC-V support to PyTorch
  2. Share our proposed roadmap for implementation
  3. Gather community feedback and identify potential blockers
  4. Collaborate with the PyTorch community to make RISC-V a first-class citizen

Motivation

Why Now?

RISC-V is an open-source instruction set architecture (ISA) that has been gaining significant momentum in the industry. Historically, RISC-V’s AI efforts have been primarily focused on edge/embedded scenarios, where lightweight inference runtimes were sufficient and native PyTorch support was not a priority.

However, the landscape is rapidly changing:

  • High-Performance RISC-V SoCs Are Emerging: The RISC-V ecosystem has made remarkable progress in high-performance computing. Multi-core, high-frequency RISC-V processors with advanced vector extensions (RVV 1.0) and matrix extensions (RVM) are now becoming available.

  • Server-Class RISC-V CPUs Are on the Horizon: Several companies are developing server-grade RISC-V processors targeting data center workloads. This opens up new possibilities for running full AI/ML software stacks directly on RISC-V hardware.

  • The Timing Is Right: With these hardware advancements, the conditions for running PyTorch natively on RISC-V are now being met. Establishing PyTorch support now will ensure the ecosystem is ready when server-class RISC-V hardware becomes widely available.

The Need for Native Support

Currently, PyTorch has limited support for RISC-V, which creates barriers for:

  • Researchers and developers working on RISC-V-based AI hardware
  • Companies developing RISC-V AI accelerators
  • The broader RISC-V ecosystem that wants to leverage PyTorch’s capabilities

By enabling comprehensive RISC-V support, we can:

  • Expand PyTorch’s reach to the growing RISC-V ecosystem
  • Enable AI workloads on RISC-V-based edge devices and servers
  • Foster innovation in open-source hardware and software for AI

Proposed Roadmap

Phase 1: CI Infrastructure and Validation

Goal: Establish a robust CI pipeline for RISC-V builds and testing

  • Complete CI Execution Flow Validation

    • Ensure PyTorch can be built and tested on RISC-V architecture
    • Validate all existing test suites pass on RISC-V targets
    • Identify and fix architecture-specific issues
  • RISC-V Development Board Contribution

    • We are prepared to provide high-performance RISC-V development boards for PyTorch’s CI infrastructure
    • This will enable continuous testing and validation of RISC-V support
    • We will work with the PyTorch infrastructure team to integrate these boards

Phase 2: High-Performance Micro-Kernel Library

Goal: Deliver optimized implementations for core ATen operators

  • Develop a RISC-V Optimized uKernel Library

    • Similar to KleidiAI for ARM architecture
    • Leverage RISC-V Vector Extension (RVV) and Matrix Extension (RVM) for SIMD operations
    • Target core ATen operators first (Matmul, convolution, etc.)
  • Incremental Operator Coverage

    • Start with the most commonly used operators in popular models
    • Gradually expand coverage to remaining operators
  • Performance Benchmarking

    • Establish baseline performance metrics
    • Compare with reference implementations
    • Continuous performance regression testing

Phase 3: torch.compile Backend Extension

Goal: Enable high-performance graph compilation for RISC-V

  • Develop RISC-V Backend for torch.compile

    • Implement code generation targeting RISC-V architecture
    • Support RVV/RVM vectorization in generated code
    • Enable operator fusion and other graph-level optimizations
  • Integration with Inductor

    • Work with the PyTorch compiler team to integrate RISC-V support
    • Ensure compatibility with existing compilation pipelines
    • Support both eager and compiled execution modes

Phase 4: Triton/TileLang Native Support

Goal: Enable native Triton kernel development for RISC-V

  • Triton Backend for RISC-V

    • Develop RISC-V code generation backend for Triton
    • Support RISC-V RVV/RVM in Triton kernels
    • Enable custom kernel development for RISC-V targets
  • TileLang Integration

    • Explore TileLang support for RISC-V architecture
    • Enable tile-based programming model for RISC-V AI accelerators

Phase 5: LLM Inference Framework Support

Goal: Enable large language model deployment on RISC-V

  • vLLM Support

    • Port vLLM to RISC-V architecture
  • SGLang Support

    • Enable SGLang runtime on RISC-V

Questions for the Community

We would like to gather feedback on the following:

  1. Technical Blockers

    • What are the other known technical challenges for RISC-V support in PyTorch?
  2. CI Infrastructure

    • We are able to provide RISC-V development boards for community CI testing. What is the recommended approach to integrate these boards into PyTorch’s existing GitHub Actions CI pipeline?
    • What are the requirements for contributing RISC-V hardware to PyTorch CI?
    • What is the preferred approach for integrating new architecture support?
  3. Prioritization

    • Which operators/features should be prioritized for RISC-V optimization?
    • Are there specific use cases the community would like to see supported first?
  4. Collaboration

    • Are there other teams/companies working on RISC-V support for PyTorch?
    • How can we best coordinate efforts to avoid duplication?
  5. Testing and Validation

    • What test coverage is expected for new architecture support?
    • Are there specific benchmarks we should target?

Our Commitment

We are committed to:

  • Contributing high-quality, well-tested code
  • Maintaining RISC-V support long-term
  • Providing hardware resources for CI infrastructure
  • Collaborating openly with the PyTorch community
  • Following PyTorch’s contribution guidelines and code standards

Call for Feedback

We welcome any feedback, suggestions, or concerns from the PyTorch community. Our goal is to work together to make RISC-V a well-supported architecture in PyTorch, benefiting both the PyTorch and RISC-V ecosystems.

Please share your thoughts on:

  • The proposed roadmap
  • Technical considerations we may have missed
  • Potential collaboration opportunities
  • Any blockers or concerns you foresee

We look forward to working with the community to bring native RISC-V support to PyTorch!

@ptrblck @malfet