Error in loading the model

When I make a small project to check. It works fine… But as soon as I put the same thing in the Kaldi (Pytorch lattice rescoring) and run the executable file then I receive the same error as mentioned below.

Error

data/pytorch/rnnlm/newmodel2.pt error loading the model
_ivalue_ INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/jit/api/object.cpp":19, please report a bug to PyTorch.  (_ivalue at /pytorch/torch/csrc/jit/api/object.cpp:19)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7f83b939d666 in /home/rakesh/rishabh_workspace/Garbage/kaldi/tools/libtorch/lib/libc10.so)
frame #1: torch::jit::Object::_ivalue() const + 0xab (0x7f83acdc42cb in /home/rakesh/rishabh_workspace/Garbage/kaldi/tools/libtorch/lib/libtorch_cpu.so)
frame #2: torch::jit::Object::find_method(std::string const&) const + 0x26 (0x7f83acdc43a6 in /home/rakesh/rishabh_workspace/Garbage/kaldi/tools/libtorch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x9192f (0x5592eb17892f in lattice-lmrescore-py-rnnlm)
frame #4: <unknown function> + 0x86b78 (0x5592eb16db78 in lattice-lmrescore-py-rnnlm)
frame #5: <unknown function> + 0x867d8 (0x5592eb16d7d8 in lattice-lmrescore-py-rnnlm)
frame #6: <unknown function> + 0x23c38 (0x5592eb10ac38 in lattice-lmrescore-py-rnnlm)
frame #7: __libc_start_main + 0xe7 (0x7f836b610b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: <unknown function> + 0x2340a (0x5592eb10a40a in lattice-lmrescore-py-rnnlm)

pyrnnlm.cc




#include <utility>
#include <fstream>

#include "pyrnnlm/pytorch-rnnlm.h"
#include "util/stl-utils.h"
#include "util/text-utils.h"

// torch::Tensorflow includes were moved after tfrnnlm/tensorflow-rnnlm.h include to
// avoid macro redefinitions. See also the note in tfrnnlm/tensorflow-rnnlm.h.
#include <torch/torch.h>
#include <torch/script.h>
#include <iostream>
#include <memory>
#include <dirent.h>

namespace kaldi {
using std::ifstream;
using py_rnnlm::KaldiPyRnnlmWrapper;
using py_rnnlm::PyRnnlmDeterministicFst;


// read a unigram count file of the OOSs and generate extra OOS costs for words
void SetUnkPenalties(const string &filename,
                     const fst::SymbolTable& fst_word_symbols,
                     std::vector<float> *out) {
  if (filename == "")
    return;
  out->resize(fst_word_symbols.NumSymbols(), 0);  // default is 0
  ifstream ifile(filename.c_str());
  string word;
  float count, total_count = 0;
  while (ifile >> word >> count) {
    int id = fst_word_symbols.Find(word);
    KALDI_ASSERT(id != -1); // fst::kNoSymbol
    (*out)[id] = count;
    total_count += count;
  }

  for (int i = 0; i < out->size(); i++) {
    if ((*out)[i] != 0) {
      (*out)[i] = log ((*out)[i] / total_count);
    }
  }
}

// Read pytorch model files
// Done ****
void KaldiPyRnnlmWrapper::ReadPyModel(const std::string &py_model_path,
                                      int32 num_threads) {

  // Need to initialise it
  // torch::jit::script::Module module;
  try {
    // Deserialize the ScriptModule from a file using torch::jit::load().
    //std::cout << "Model " << py_model_path.substr(0, py_model_path.size()-1) << " /newmodel2.pt";
    std::string file("/newmodel2.pt");
    // Load model in the module
    std::string path = py_model_path + file;

    std::string::iterator st = std::remove(path.begin(), path.end(), ' ');

    path.erase(st, path.end());
    std::cout << path;
    module = torch::jit::load(path);
  }
  catch (const c10::Error& e) {
    std::cerr << " error loading the model\n";
    //return -1;
    return;
  }

  std::cout << "Language Model\n\n";

  // (Samrat): Think we need few of these, not all
  word_id_tensor_name_ = "word_id";
  context_tensor_name_ = "context";
  log_prob_tensor_name_ = "log_prob";
  rnn_out_tensor_name_ = "rnn_out";
  rnn_states_tensor_name_ = "rnn_states";
  initial_state_tensor_name_ = "initial_state";
  
}

// Done ****
// Batch_size = 1 they have hard code it
KaldiPyRnnlmWrapper::KaldiPyRnnlmWrapper(
    const KaldiPyRnnlmWrapperOpts &opts,
    const std::string &rnn_wordlist,
    const std::string &word_symbol_table_rxfilename,
    const std::string &unk_prob_file,
    const std::string &py_model_path): opts_(opts) {
  ReadPyModel(py_model_path, opts.num_threads);

  fst::SymbolTable *fst_word_symbols = NULL;
  if (!(fst_word_symbols =
        fst::SymbolTable::ReadText(word_symbol_table_rxfilename))) {
    KALDI_ERR << "Could not read symbol table from file "
        << word_symbol_table_rxfilename;
  }

  fst_label_to_word_.resize(fst_word_symbols->NumSymbols());

  for (int32 i = 0; i < fst_label_to_word_.size(); ++i) {
    fst_label_to_word_[i] = fst_word_symbols->Find(i);
    if (fst_label_to_word_[i] == "") {
      KALDI_ERR << "Could not find word for integer " << i << " in the word "
          << "symbol table, mismatched symbol table or you have discoutinuous "
          << "integers in your symbol table?";
    }
  }

  // first put all -1's; will check later
  fst_label_to_rnn_label_.resize(fst_word_symbols->NumSymbols(), -1);
  num_total_words = fst_word_symbols->NumSymbols();

  // read rnn wordlist and then generate ngram-label-to-rnn-label map
  oos_ = -1;
  { // input.
    ifstream ifile(rnn_wordlist.c_str());
    string word;
    int id = -1;
    eos_ = 0;
    while (ifile >> word) {
      id++;
      rnn_label_to_word_.push_back(word);  // vector[i] = word

      int fst_label = fst_word_symbols->Find(word);
      if (fst_label == -1) { // fst::kNoSymbol
        if (id == eos_)
          continue;

        KALDI_ASSERT(word == opts_.unk_symbol && oos_ == -1);
        oos_ = id;
        continue;
      }
      KALDI_ASSERT(fst_label >= 0);
      fst_label_to_rnn_label_[fst_label] = id;
    }
  }
  if (fst_label_to_word_.size() > rnn_label_to_word_.size()) {
    KALDI_ASSERT(oos_ != -1);
  }
  num_rnn_words = rnn_label_to_word_.size();

  // we must have an oos symbol in the wordlist
  if (oos_ == -1)
    return;

  for (int i = 0; i < fst_label_to_rnn_label_.size(); i++) {
    if (fst_label_to_rnn_label_[i] == -1) {
      fst_label_to_rnn_label_[i] = oos_;
    }
  }

  AcquireInitialTensors();
  SetUnkPenalties(unk_prob_file, *fst_word_symbols, &unk_costs_);
  delete fst_word_symbols;
}

KaldiPyRnnlmWrapper::~KaldiPyRnnlmWrapper() {
}
// Done
 
void KaldiPyRnnlmWrapper::AcquireInitialTensors() {
  // Status status;
  // get the initial context; this is basically the all-0 tensor
  /*
  (Samrat): Have to figure out get_initial_state(batch_size) ? what should btchsz be ?
  */
  //auto hidden = module.get_method("get_initial_state")({torch::tensor({1})});
  //initial_context_ = hidden.toTensor();

  initial_context_=module.get_method("get_initial_state")({torch::tensor({1})}).toTensor();


  //changed function call name (Samrat)
  auto bosword = torch::tensor({eos_});

  auto hidden = module.get_method("single_step_rnn_out")({initial_context_, bosword});
  initial_cell_ = hidden.toTensor();




  // {
  //   std::vector<torch::Tensor> state;
  //   status = bundle_.session->Run(std::vector<std::pair<string, torch::Tensor> >(),
  //                          {initial_state_tensor_name_}, {}, &state);
  //   if (!status.ok()) {
  //     KALDI_ERR << status.ToString();
  //   }
  //   initial_context_ = state[0];
  // }

  // get the initial pre-final-affine layer
  // {
  //   std::vector<torch::Tensor> state;
  //   torch::Tensor bosword(tensorflow::DT_INT32, {1, 1});
  //   bosword.scalar<int32>()() = eos_;  // eos_ is more like a sentence boundary

  //   std::vector<std::pair<string, torch::Tensor> > inputs = {
  //     {word_id_tensor_name_, bosword},
  //     {context_tensor_name_, initial_context_},
  //   };

  //   status = bundle_.session->Run(inputs, {rnn_out_tensor_name_}, {}, &state);
  //   if (!status.ok()) {
  //     KALDI_ERR << status.ToString();
  //   }
  //   initial_cell_ = state[0];
  // }
}


/*
// Need to change *****
BaseFloat KaldiPyRnnlmWrapper::GetLogProb(int32 word,
                                          int32 fst_word,
                                          const torch::Tensor &context_in,
                                          const torch::Tensor &cell_in,
                                          torch::Tensor *context_out,
                                          torch::Tensor *new_cell) {
  torch::Tensor thisword(torch::Tensor, {1, 1});

  thisword.scalar<int32>()() = word;

  std::vector<torch::Tensor> outputs;

  std::vector<std::pair<string, torch::Tensor> > inputs = {
    {word_id_tensor_name_, thisword},
    {context_tensor_name_, context_in},
  };

  if (context_out != NULL) {
    // The session will initialize the outputs
    // Run the session, evaluating our "c" operation from the graph
    Status status = bundle_.session->Run(inputs,
        {log_prob_tensor_name_,
         rnn_out_tensor_name_,
         rnn_states_tensor_name_}, {}, &outputs);
    if (!status.ok()) {
      KALDI_ERR << status.ToString();
    }

    *context_out = outputs[1];
    *new_cell = outputs[2];
  } else {
    // Run the session, evaluating our "c" operation from the graph
    Status status = bundle_.session->Run(inputs,
        {log_prob_tensor_name_}, {}, &outputs);
    if (!status.ok()) {
      KALDI_ERR << status.ToString();
    }
  }

  float ans;
  if (word != oos_) {
    ans = outputs[0].scalar<float>()();
  } else {
    if (unk_costs_.size() == 0) {
      ans = outputs[0].scalar<float>()() - log(num_total_words - num_rnn_words);
    } else {
      ans = outputs[0].scalar<float>()() + unk_costs_[fst_word];
    }
  }

  return ans;
}
*/

/*
  Below is my(Samrat) modified version of the above function only. 
  Replace if you think something is incorrect.
*/


BaseFloat KaldiPyRnnlmWrapper::GetLogProb(int32 word,
                                          int32 fst_word,
                                          const torch::Tensor &context_in,
                                          const torch::Tensor &cell_in,
                                          torch::Tensor *context_out,
                                          torch::Tensor *new_cell) {
  //torch::Tensor thisword(torch::Tensor, {1, 1});
  
  //thisword.scalar<int32>()() = word;
  torch::Tensor thisword = torch::tensor({word});


  //std::vector<torch::Tensor> outputs;

  // std::vector<std::pair<string, torch::Tensor> > inputs = {
  //   {word_id_tensor_name_, thisword},
  //   {context_tensor_name_, context_in},
  // };



  auto outputs = module.get_method("single_step")({context_in, thisword});
  if (context_out != NULL) {
    // The session will initialize the outputs
    // Run the session, evaluating our "c" operation from the graph
    // Status status = bundle_.session->Run(inputs,
    //     {log_prob_tensor_name_,
    //      rnn_out_tensor_name_,
    //      rnn_states_tensor_name_}, {}, &outputs);

    // if (!status.ok()) {
    //   KALDI_ERR << status.ToString();
    // }

    *context_out = module.get_method("single_step_rnn_out")({context_in, thisword}).toTensor();
    *new_cell = module.get_method("single_step_rnn_state")({context_in, thisword}).toTensor();
  } //else {
    // Run the session, evaluating our "c" operation from the graph
    // Status status = bundle_.session->Run(inputs,
    //     {log_prob_tensor_name_}, {}, &outputs);
    // if (!status.ok()) {
    //   KALDI_ERR << status.ToString();
    // }
  //}

  /*
    (Samrat): Can through error so have to check manually in testLM
    Hopefully expect it to return a float
  */
 
  float log_prob=(float)module.get_method("single_step_log")({context_in, thisword}).toDouble();
  float ans;
  if (word != oos_) {
    //ans = outputs[0].scalar<float>()();
    ans = log_prob;
  } else {
    if (unk_costs_.size() == 0) {
      //ans = outputs[0].scalar<float>()() - log(num_total_words - num_rnn_words);
      ans = log_prob - log(num_total_words - num_rnn_words);
    } else {
      //ans = outputs[0].scalar<float>()() + unk_costs_[fst_word];
      ans = log_prob + unk_costs_[fst_word];
    }
  }

  return ans;
}

// Done *****
const torch::Tensor& KaldiPyRnnlmWrapper::GetInitialContext() const {
  return initial_context_;
}

const torch::Tensor& KaldiPyRnnlmWrapper::GetInitialCell() const {
  return initial_cell_;
}

int KaldiPyRnnlmWrapper::FstLabelToRnnLabel(int i) const {
  KALDI_ASSERT(i >= 0 && i < fst_label_to_rnn_label_.size());
  return fst_label_to_rnn_label_[i];
}


// Done *****
PyRnnlmDeterministicFst::PyRnnlmDeterministicFst(int32 max_ngram_order,
                                             KaldiPyRnnlmWrapper *rnnlm) {
  KALDI_ASSERT(rnnlm != NULL);
  max_ngram_order_ = max_ngram_order;
  rnnlm_ = rnnlm;

  std::vector<Label> bos;
  const torch::Tensor& initial_context = rnnlm_->GetInitialContext();
  const torch::Tensor& initial_cell = rnnlm_->GetInitialCell();

  state_to_wseq_.push_back(bos);
  state_to_context_.push_back(new torch::Tensor(initial_context));
  state_to_cell_.push_back(new torch::Tensor(initial_cell));
  wseq_to_state_[bos] = 0;
  start_state_ = 0;
}

// Done *****
PyRnnlmDeterministicFst::~PyRnnlmDeterministicFst() {
  for (int i = 0; i < state_to_context_.size(); i++) {
    delete state_to_context_[i];
  }
  for (int i = 0; i < state_to_cell_.size(); i++) {
    delete state_to_cell_[i];
  }
}

// Done *****
void PyRnnlmDeterministicFst::Clear() {
  // similar to the destructor but we retain the 0-th entries in each map
  // which corresponds to the <bos> state
  for (int i = 1; i < state_to_context_.size(); i++) {
    delete state_to_context_[i];
  }
  for (int i = 1; i < state_to_cell_.size(); i++) {
    delete state_to_cell_[i];
  }

  state_to_context_.resize(1);
  state_to_cell_.resize(1);
  state_to_wseq_.resize(1);
  wseq_to_state_.clear();
  wseq_to_state_[state_to_wseq_[0]] = 0;
}

// Done *****
fst::StdArc::Weight PyRnnlmDeterministicFst::Final(StateId s) {
  // At this point, we should have created the state.
  KALDI_ASSERT(static_cast<size_t>(s) < state_to_wseq_.size());

  std::vector<Label> wseq = state_to_wseq_[s];
  BaseFloat logprob = rnnlm_->GetLogProb(rnnlm_->GetEos(),
                         -1,  // only need type; this param will not be used
                         *state_to_context_[s],
                         *state_to_cell_[s], NULL, NULL);
  return Weight(-logprob);
}

// Done *****
bool PyRnnlmDeterministicFst::GetArc(StateId s, Label ilabel,
                                     fst::StdArc *oarc) {
  KALDI_ASSERT(static_cast<size_t>(s) < state_to_wseq_.size());

  std::vector<Label> wseq = state_to_wseq_[s];
  torch::Tensor *new_context = new torch::Tensor();
  torch::Tensor *new_cell = new torch::Tensor();

  // look-up the rnn label from the FST label
  int32 rnn_word = rnnlm_->FstLabelToRnnLabel(ilabel);
  BaseFloat logprob = rnnlm_->GetLogProb(rnn_word,
                                         ilabel,
                                         *state_to_context_[s],
                                         *state_to_cell_[s],
                                         new_context,
                                         new_cell);

  wseq.push_back(rnn_word);
  if (max_ngram_order_ > 0) {
    while (wseq.size() >= max_ngram_order_) {
      // History state has at most <max_ngram_order_> - 1 words in the state.
      wseq.erase(wseq.begin(), wseq.begin() + 1);
    }
  }

  std::pair<const std::vector<Label>, StateId> wseq_state_pair(
      wseq, static_cast<Label>(state_to_wseq_.size()));

  // Attemps to insert the current <lseq_state_pair>. If the pair already exists
  // then it returns false.
  typedef MapType::iterator IterType;
  std::pair<IterType, bool> result = wseq_to_state_.insert(wseq_state_pair);

  // If the pair was just inserted, then also add it to <state_to_wseq_> and
  // <state_to_context_>.
  if (result.second == true) {
    state_to_wseq_.push_back(wseq);
    state_to_context_.push_back(new_context);
    state_to_cell_.push_back(new_cell);
  } else {
    delete new_context;
    delete new_cell;
  }

  // Creates the arc.
  oarc->ilabel = ilabel;
  oarc->olabel = ilabel;
  oarc->nextstate = result.first->second;
  oarc->weight = Weight(-logprob);

  return true;
}

}  // namespace kaldi

@krrishabh
Can you post this in ‘jit’ category?

And, check what pytorch/libtorch version is used to generate your .pt file and what libtorch version you were used to load it. This might be a version mismatch as well.

I have change the tag from C++ to jit
libtorch is used to load the model
libtorch Build version : 1.6.0.dev20200501+cu101

Model is saved in Pytorch version : 1.3.1

Moreover, I have made a small project where it is working fine… but in my Kaldi Project the model is not able to get load. (both have same Configuration)

@krrishabh
I think version mismatch might be the root cause.
If you saved your .pt with 1.3.1, please use the same version of libtorch to load it.
We have a similar issue here,
https://github.com/pytorch/pytorch/issues/39623
please read the comments see how to download libtorch 1.3.1, or if possible, upgrade your pytorch to 1.6dev, or I believe 1.5 is fine.

I have upgrade the Pytorch version to 1.5 but I am getting the same error. Moreover, It was not a problem of version because when I am building a small project it is working correctly.

2 Likes

@krrishabh
Since the error message says “error loading the model”, it is likely the program failed to load the model file in torch::jit::load(path).

Please make sure data/pytorch/rnnlm/newmodel2.pt exists and it is accessible from the binary.
If you compile the program into build directory, the path might be ../data/pytorch/rnnlm/newmodel2.pt.

And the message “_ivalue_ INTERNAL ASSERT FAILED at ...” is shown when an instance of torch::jit::script::Module is not initialized (like the situation that the exception from torch::jit::load() is caught and you reference the uninitialized module instance after that).

@m4saka
Did you figure out this issue? I’ve been suffering the same. torch::jit::load brings the model without a problem as an individual project, but when it is combined with another project, it shows:

error loading the model
terminate called after throwing an instance of ‘c10::Error’
what(): ivalue INTERNAL ASSERT FAILED at “…/torch/csrc/jit/api/object.cpp”:19, please report a bug to PyTorch.
Exception raised from _ivalue at …/torch/csrc/jit/api/object.cpp:19 (most recent call first):

1 Like

I am facing the same problem. In individual consol project it runs without any problem. But with combined project it can’t load the same model.

Did you find the solution to your problem? i am stuck as wel…