Advice for debugging a TorchScript module from C++

I have trained a model from python and now I’m trying to load it from a c++ program. Unfortunately it crashes on the line // crash here

#include <torch/script.h>
#include <iostream>
#include <vector>
#include "nets/nets.h"
#include "util/runfiles.h"

int main(int argc, char** argv) {
  std::cout << "Nets example" << std::endl;

  // Custom code that loads the module
  auto runfiles = MakeRunfiles(argv[0]);
  torch::jit::script::Module segnet3 = LoadSegNet3(*runfiles);
  std::cout << "Loaded SegNet3" << std::endl;

  // Make a fake image.
  torch::Tensor input = torch::randn({1, 3, 300, 300});
  std::cout << "Made random input" << std::endl;

  std::vector<torch::jit::IValue> inputs;
  inputs.push_back(input);

  // crash here
  torch::Tensor output = segnet3.forward(inputs).toTensor();
  std::cout << "Output done" << std::endl;
}

The output of the program is

Nets example
Loaded SegNet3
Made random input

and it exits without writing any error message.

What can I do to debug the situation? I am wondering if maybe my input tensor is of the wrong dimensionality. Is there a way to check the dimensions the network is expecting? And/or is there a way to print some debug information about the structure of the network?

A segfault should not happen with that program even when the shapes do not match.
Most often segfaults happen when people use from_blob and the memory-owner goes out of scope, but this is not case here.

  • Does this also happen with a small / trivial module? If not, there might be a specific operation that it does not like. Ideally you would be able to isolate it.
  • If it does even for a trivial module, you could compile libtorch / PyTorch in Debug mode and run it with gdb -ex run --args myprogram to get a backtrace (with bt after the segfault).

Best regards

Thomas

Thanks tom. It’s not so easy for me to use gdb since I’m on Windows, but I did happen to discover the issue. In my loading code, I had moved the model by calling .to(at::kCUDA) so the tensor needed to be moved too before.

Great you figured that out. Is it crashing as in “segfault” or does it nominally print an error message on the console?

Best regards

Thomas

There actually is no error message - it exits silently