Opdiff: cross-backend PyTorch operator testing + results dashboard

I’ve been experimenting with opdiff, a framework for systematically testing PyTorch operators and modules across multiple backends (Torch, ONNX Runtime, CoreML, ExecuTorch, etc.).
Repo: https://github.com/0xShug0/opdiff/

Tests are defined with YAML and can sweep hundreds of operators and module variants across different backend configs, then compare results against a chosen baseline for correctness and numerical parity.

While running large operator sweeps, I realized summary stats hide a lot of useful signal (export succeeds but runtime fails, backend-specific numerical drift, config-specific failures, etc.).

To make the results easier to explore, I built a small dashboard:

[ptrblck: link removed as it was flagged as a security risk]

You can search operators/modules, compare backend configs, and see export vs runtime failures instead of just aggregated numbers.

Some of these runs already surfaced real issues (a couple ExecuTorch bugs were filed). If you spot something interesting, feel free to dig in or report upstream.