How can I force operator available when statically linking?

I’m building a program that utilizes libtorch to perform inference using static libs. Things work alright if I link libtorch_cpu.a using the whole-archive approach, but I need to be able to build without whole-archive. In the case without whole-archive, the build process succeeds, but I get an error at runtime that aten::mul isn’t available. Web searches resulted in the same issue, but usually the solution is to use whole-archive which is what I need to avoid.

Given that that is always the solution, I’m sure it will be tedious, but I’d like to just try to manually register the operators that are needed for my particular inference capability. My thought is to run the executable, see which operator it fails on, then update the code to manually register that operator, then rebuild, run, see which operator it complains about now, modify code to register it, and repeat as needed.

The problem is that I can’t figure out how to do the manual registration. I’ve found some stuff about registering custom operators, but these aren’t custom ops.

In the pytorch source tree, I found some code that registers a couple of aten operators - I stole it and pasted it into my own code, and it does look it now registers those two operators twice now, so I think the approach might work, but despite a lot of searching, I haven’t been able to find analogous code for the operators I’m looking for. For what its worth - this is the code I’ve added to my program:

#include <torch/csrc/jit/runtime/register_ops_utils.h>

namespace torch {
namespace jit {

RegisterOperators reg_ops_directly(
         "aten::_ncf_unsqueeze(Tensor(a) self, int ndim) -> Tensor(a)",
         [](Stack& stack) {
           const int64_t ndim = pop(stack).toInt();
           auto self = pop(stack).toTensor();
           c10::SmallVector<int64_t, 8> sizes(ndim, 1);
           AT_ASSERT(self.dim() == 1);
  = self.size(0);
           push(stack, self.reshape(sizes));
         "aten::_ncf_view(Tensor(a) self, int[] input_shape, int normalized_ndim) -> Tensor(a)",
         [](Stack& stack) {
           const int64_t normalized_ndim = pop(stack).toInt();
           auto input_shape = pop(stack).toIntList();
           auto self = pop(stack).toTensor();
           const int64_t input_ndim = input_shape.size();
           c10::SmallVector<int64_t, 8> sizes(input_ndim, 1);
           for (int i = 0; i < input_ndim - normalized_ndim; ++i) {
    = input_shape.get(i);
           push(stack, self.reshape(sizes));

Without that code added, those two operators are listed in torch::jit::getAllOperators once, and with the code added, those two operators are listed twice.

Can I use this approach to manually register the aten::mul operator? I fully realize there could be a lot more operators after that, which I am ok with - if I just figure out how to get the first one situated, I hope I’ll be able to get through others.

If there’s a different / better way (that doesn’t involve whole-archive), I’d be up for suggestions there too. One thing I was thinking was to extract out the necessary .o files from libtorch_cpu.a and making a “sub-library” where I link with whole-archive for the (smaller) sub-library, but linked without whole-archive for the rest. I’ve tried this a bunch, but have not been successful yet in determining which .o files need to be in the sub-library to make it work. I like the idea of manually registering the ops better because it gives me better control over what is linked in and what is not, but like I said, open to suggestion.

[ Full disclosure: I posted an issue on github a while back for this same issue here: Static Linking C++, Op not available at runtime · Issue #111654 · pytorch/pytorch · GitHub ]

I really appreciate any guidance - thanks!