I want to use libtorch for multi gpu inference, is there any example or tutorial?
Should I clone multi jit::script::Module
and move them to different gpu?
If I’m not mistaken torch::nn::parallel::data_parallel
would be the equivalent to nn.DataParallel
in the Python frontend and in case you would like to use DistributedDataParallel
feel free to add your use case in this poll.
I want to use multi gpu manually, because the input data size is different.
for (int i = 0; i < param_.gpu_size(); ++i) {
torch::Device device(torch::kCUDA, i);
models.push_back(torch::jit::load(MODEL_PATH, device));
models.back()->eval();
}
And I will create gpu_size
threads to run the inference.
Is it the correct way?
I used 2 threads to run the model on 2 gpus, like:
std::vector<std::thread*> threads(0);
int parallel = std::min(param_.gpu_size(),
static_cast<int>(batched_samples.size()));
batches.resize(parallel);
for (int i = 0; i < parallel; ++i) {
batches[i].resize(0);
for (int j = i; j < betched_samples.size(); j += parallel) {
batches[i].push_back(std::move(batched_samples[j]));
}
LOG(ERROR) << "==== predict level-" << l << " batch-"
<< batches[i].size();
threads.push_back(new std::thread(
&predict_torch, models_[i],
std::ref(batches[i]), param_.gpu(i));
}
for (int i = 0; i < threads.size(); ++i) {
threads[i]->join();
delete threads[i];
}
But I got the:
terminate called after throwing an instance of 'c10::Error'
what(): r INTERNAL ASSERT FAILED at "../aten/src/ATen/core/jit_type_base.h":172, please report a bug to PyTorch.
Exception raised from expect at ../aten/src/ATen/core/jit_type_base.h:172 (most recent call first):
frame #0: <unknown function> + 0x101f9b (0x7fa6e14e6f9b in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x4c (0x7fa6e14e7e20 in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x40 (0x7fa6e14e6030 in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: std::shared_ptr<c10::ClassType> c10::Type::expect<c10::ClassType>() + 0xbb (0x7fa6c2ae3b1b in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: c10::ivalue::Object::type() const + 0x41 (0x7fa6c2ad10f1 in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x5ec04cf (0x7fa6c69c14cf in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::jit::Object::find_method(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const + 0x37 (0x7fa6c69d48c1 in /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.s
o)
frame #7: torch::jit::Object::get_method(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const + 0x50 (0x7fa6d7179414 in /root/ecopia-weaver-multi-scale-framework/src/build/libecopia_weaver.
so)
frame #8: torch::jit::Module::forward(std::vector<c10::IValue, std::allocator<c10::IValue> >) + 0xba (0x7fa6d717979a in /root/ecopia-weaver-multi-scale-framework/src/build/libecopia_weaver.so)
frame #9: ecopia::ml::CaffeForwardMultiScale::predict_torch(torch::jit::Module*, std::vector<std::vector<ecopia::ml::Sample*, std::allocator<ecopia::ml::Sample*> >, std::allocator<std::vector<ecopia::ml::Sample*, std::allocator<ec
opia::ml::Sample*> > > > const&, int, int, int, int, ecopia::ml::LevelInfo const&, std::unordered_map<int, MultiChannelRasterData<float>*, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, MultiChannelRasterD
ata<float>*> > >&, std::unordered_map<int, MultiChannelLabelMap<unsigned char>*, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, MultiChannelLabelMap<unsigned char>*> > >&, std::unordered_map<int, SingleCha
nnelLabelMap<unsigned char>*, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, SingleChannelLabelMap<unsigned char>*> > >&) + 0x139c (0x7fa6d7172224 in /root/ecopia-weaver-multi-scale-framework/src/build/lib
ecopia_weaver.so)
Anyone have ideas?
If I run the model on one of the two gpus, both can work.