LibTorch with NVIDIA PyG Container/C++ within training loop

Hi, I’m doing some novel research involving integrating PyG with flecs entity component system.
I am currently trying to train a model to learn the heuristic for A* pathfinding. The pathfinding code is written in C++. I am trying to understand how to use the total nodes in the open set as a loss parameter.
I am trying to compile the latest LibTorch on PyG | NVIDIA NGC
via a Modal volume. Volumes | Modal Docs

Currently there is an error with loading the shared libraries.

/example-app: symbol lookup error: ./example-app: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2
_jS2_RKSs

Currently, I am investigating. But I would love help on either matching the right LibTorch version to this container, or otherwise understanding what is the best route for calling custom C++ within the training loop (especially for continual learning models).

I was able to get LibTorch compiling by using a CUDA version matching the distro:
https://download.pytorch.org/libtorch/cu124/libtorch-cxx11-abi-shared-with-deps-2.4.0%2Bcu124.zip
In terms of the C++ loss, I am considering a custom C++ PyTorch operator
PyTorch Custom Operators — PyTorch Tutorials 2.4.0+cu121 documentation to run the training code, and TorchScript for production.