Code Generation for PyTorch C++ Bare Metal (without OS)?

I’m interested in performing the code generation for running a pytorch model in C/C++. Specifically I would like this code to run on a bare metal device (no OS) and without any dependencies on things that an OS would typically provide such as threads or file IO. Further, it would be great to have this compile with as reduced a dynamic memory footprint as possible (as much static RAM usage as possible). Is this currently possible or a future goal of the Pytorch C++ frontend? Also are there any metrics on memory usage for loading pytorch header/libraries/dependencies in C++?

I just came across Glow which seems to do what I’m after?

I was about to post the same, but I wasn’t sure how “low level” you need to go.

Glow builds and runs on macOS and Linux. The software depends on a modern C++ compiler that supports C++11, on CMake, LLVM, protocol buffers, and libpng.