How to use multi gpu inference in libtorch?

lcxywfe · April 11, 2021, 5:24pm

I want to use libtorch for multi gpu inference, is there any example or tutorial?
Should I clone multi jit::script::Module and move them to different gpu?

ptrblck · April 12, 2021, 5:34am

If I’m not mistaken torch::nn::parallel::data_parallel would be the equivalent to nn.DataParallel in the Python frontend and in case you would like to use DistributedDataParallel feel free to add your use case in this poll.

lcxywfe · April 12, 2021, 5:51am

I want to use multi gpu manually, because the input data size is different.

for (int i = 0; i < param_.gpu_size(); ++i) {
    torch::Device device(torch::kCUDA, i);
    models.push_back(torch::jit::load(MODEL_PATH, device));
    models.back()->eval();
}

And I will create gpu_size threads to run the inference.
Is it the correct way?

lcxywfe · April 16, 2021, 9:08am

I used 2 threads to run the model on 2 gpus, like:

      std::vector<std::thread*> threads(0);
      int parallel = std::min(param_.gpu_size(),
                              static_cast<int>(batched_samples.size()));
      batches.resize(parallel);
      for (int i = 0; i < parallel; ++i) {
          batches[i].resize(0);
          for (int j = i; j < betched_samples.size(); j += parallel) {
              batches[i].push_back(std::move(batched_samples[j]));
          }
          LOG(ERROR) << "==== predict level-" << l << " batch-"
                     << batches[i].size();

          threads.push_back(new std::thread(
                  &predict_torch, models_[i],
                  std::ref(batches[i]), param_.gpu(i));
      }
      for (int i = 0; i < threads.size(); ++i) {
          threads[i]->join();
          delete threads[i];
      }

But I got the:

terminate called after throwing an instance of 'c10::Error'
  what():  r INTERNAL ASSERT FAILED at "../aten/src/ATen/core/jit_type_base.h":172, please report a bug to PyTorch. 
Exception raised from expect at ../aten/src/ATen/core/jit_type_base.h:172 (most recent call first):
frame #0: <unknown function> + 0x101f9b (0x7fa6e14e6f9b in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x4c (0x7fa6e14e7e20 in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x40 (0x7fa6e14e6030 in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: std::shared_ptr<c10::ClassType> c10::Type::expect<c10::ClassType>() + 0xbb (0x7fa6c2ae3b1b in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: c10::ivalue::Object::type() const + 0x41 (0x7fa6c2ad10f1 in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x5ec04cf (0x7fa6c69c14cf in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::jit::Object::find_method(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const + 0x37 (0x7fa6c69d48c1 in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.s
o)
frame #7: torch::jit::Object::get_method(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const + 0x50 (0x7fa6d7179414 in /root/ecopia-weaver-multi-scale-framework/src/build/libecopia_weaver.
so)
frame #8: torch::jit::Module::forward(std::vector<c10::IValue, std::allocator<c10::IValue> >) + 0xba (0x7fa6d717979a in /root/ecopia-weaver-multi-scale-framework/src/build/libecopia_weaver.so)
frame #9: ecopia::ml::CaffeForwardMultiScale::predict_torch(torch::jit::Module*, std::vector<std::vector<ecopia::ml::Sample*, std::allocator<ecopia::ml::Sample*> >, std::allocator<std::vector<ecopia::ml::Sample*, std::allocator<ec
opia::ml::Sample*> > > > const&, int, int, int, int, ecopia::ml::LevelInfo const&, std::unordered_map<int, MultiChannelRasterData<float>*, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, MultiChannelRasterD
ata<float>*> > >&, std::unordered_map<int, MultiChannelLabelMap<unsigned char>*, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, MultiChannelLabelMap<unsigned char>*> > >&, std::unordered_map<int, SingleCha
nnelLabelMap<unsigned char>*, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, SingleChannelLabelMap<unsigned char>*> > >&) + 0x139c (0x7fa6d7172224 in /root/ecopia-weaver-multi-scale-framework/src/build/lib
ecopia_weaver.so)

Anyone have ideas?

lcxywfe · April 16, 2021, 9:12am

If I run the model on one of the two gpus, both can work.