How portable are torch::save()/load() models in C++?

How portable across architectures and pytorch versions and architectures is lorch::save()/::load()?

I’m a developer working on adding ARM support to MLC@Home, a distributed computing project training many simple neural networks in parallel. Currently, we support amd64 clients (windows/linux 64-bit) and I’m trying to add ARM32/64 support. Note this is training in c++, not inference, so as I understand it, torchscript and ONNX aren’t options.

The basic work flow is: volunteer connects to server, downloads the client and a dataset, trains for N epochs, saves the resulting network (torch::save) and sends it back to the server. The server checks is the network is finished training (loss is below a threshold), and if the network isn’t done training, then that same network is re-sent to another client for another N epochs of training (loaded with torch::load).

This has been working great for our x86_64 clients on linux and windows, linked against pytorch 1.5.0 and 1.5.1. I recently built new clients for ARM, and linked against libtorch v1.6.0. However, if the ARM clients receive a network to continue training that had previously been trained on the x86 clients, the resulting network ends up with NaNs.

I’m recompiling the ARM clients against v1.5.1 to try that… but… how portable across architectures and pytorch versions and architectures is lorch::save()/::load() expected to be?

Is there any other portable model option for pure libtorch code?