Inductor CPP codegen for WebAssembly target

Can PyTorch generator AOT kernels for the wasm/browser target (e.g. generate AITemplate style a self-contained, minimal ggml/llama.cpp-style C++ program to benefit from wasm-simd and link it only to XNNPACK which also exists for wasm)?

Related to Small depthwise Conv1d: maximum perf on CPU? - #4 by smth

I guess for this to work, we’d need to get raw self-contained C++ files as output + maybe Makefiles / compilation commands / CMake / etc, so that we can then build them using Emscripten

Maybe some version can be built for wasm as well, but super-aggressive tree-shaking is needed to reduce the size. Ideally we’d just need to have only used ops wasm-simd C++ code + XNNPACK.