PyTorch C++ load model with LSTM will core dump

I want to train model using Python and predict using C++, And I follow the tutorial(Pytorch C++ tutorial), It works well. However, when I load my simple model using c++, there will be a core dump.

My python model like this:

class MyModule(nn.Module):
    def __init__(self, N, M):
        super(MyModule, self).__init__()
        self.lstm = nn.LSTM(M, M, batch_first=True)
        self.linear = nn.Linear(M, 1)

    def forward(self, inputs, h0, c0):

        output, (_, _) = self.lstm(inputs, h0, c0)
        output, _ = torch.max(output, dim=1)
        # output, _ = torch.max(inputs, dim=1)
        output = self.linear(output)
        return output

batch_size = 8
h = 33
w = 45
model =  MyModule(h, w)
data = np.random.normal(1, 1, size=(batch_size, h, w))
data = torch.Tensor(data)
h0, c0 = torch.zeros(1, batch_size, w), torch.zeros(1, batch_size, w)

traced_script_module = torch.jit.trace(model, (data, h0,c0))
traced_script_module.save('model.pt')
print(traced_script_module(data))

My c++ code like this:

int main(int argc, const char* argv[]) {
  if (argc != 2) {
    std::cerr << "usage: example-app <path-to-exported-script-module>\n";
    return -1;
  }

  // Deserialize the ScriptModule from a file using torch::jit::load().
  std::shared_ptr<torch::jit::script::Module> module = torch::jit::load(argv[1]);

  assert(module != nullptr);
  std::cout << "ok\n";
  this->module->to(at::Device("cuda:0"))

  vector<torch::jit::IValue> inputs;
  int b = 2, h = 33, w = 45;
  vector<float> data(b*h*w, 1.0);
  torch::Tensor data_tensor = torch::from_blob(data.data(), {b, h, w}.to(at::Device("cuda:0"));
  torch::Tensor h0 = torch::from_blob(vector<float>(1*b*w, 0.0), {b, h, w}).to(at::Device("cuda:0"));
  torch::Tensor c0 = torch::from_blob(vector<float>(1*b*w, 0.0), {b, h, w}).to(at::Device("cuda:0"));
  inputs.push_back(data_tensor);
  inputs.push_back(h0);
  inputs.push(c0);
  torch::Tensor output = module->forward(inputs).toTensor().cpu();
  auto accessor = output.accessor<float, 2>();
  vector<float> answer(b);
  for (int i=0; i<accessor.size(0); ++i){
        answer[i] = accessor[i][0];
  }
  cout << "predict ok" << endl;
}

The I run the cpp code, and face two problems:

1. Warning

Warning: RNN module weights are not part of single contiguous chunk of memory. 
This means they need to be compacted at every call, possibly greatly increasing memory usage. 
To compact weights again call flatten_parameters(). (_cudnn_impl at ..\aten\src\ATen\native\cudnn\RNN.cpp:1249)

When I init tensor in the model, then the warning will disapper, but the tensor I should move to gpu using hard code.

class MyModule(nn.Module):
    def __init__(self, N, M):
        super(MyModule, self).__init__()
        self.lstm = nn.LSTM(M, M, batch_first=True)
        self.linear = nn.Linear(M, 1)
        self.inputs_size = [N,M]

    def get_size(self):
        return self.inputs_size

    def init_hidden(self, size):
        hidden = torch.zeros(1, size, self.inputs_size[1])
        context = torch.zeros(1, size, self.inputs_size[1])

        hidden = hidden.to(torch.device('cuda:0'))
        context = context.to(torch.device('cuda:0'))

        return hidden, context

    def forward(self, inputs):

        output, (_, _) = self.lstm(inputs, self.init_hidden(inputs.size(0)))
        output, _ = torch.max(output, dim=1)
        # output, _ = torch.max(inputs, dim=1)
        output = self.linear(output)
        return output

2. core dump

After the software end, then core dump. When I remove the LSTM in my model, the core dump will disappear, so what is the problem.

Could you get the stack trace of the core dump?

The bug is fixed in libtorch 1.3.0. Thanks.