I use pytorch to train a faster rcnn model to locate a hand in a image. After converting the model to torchscript by using tracing mothed. and I am very sure that the model and input are all placed on GPU, there is still always an erreor that “Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!”. And its log is listed as follow. and can anyone help me to solve this tough issue?
terminate called after throwing an instance of ‘std::runtime_error’
what(): The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File “code/torch.py”, line 8, in
def forward(self: torch.TraceWrapper,
argument_1: Tensor) → Tuple[Tensor, Tensor, Tensor]:
_0, _1, _2, = (self.model).forward(argument_1, )
return (_0, _1, _2)
File "code/**torch**/torchvision/models/detection/faster_rcnn.py", line 18, in forward
s = ops.prim.NumToTensor(torch.size(img, 1))
s0 = ops.prim.NumToTensor(torch.size(img, 2))
_4, _5, _6, = (_3).forward(argument_1, )
~~~~~~~~~~~ <--- HERE
_7, _8, _9, _10, _11, _12, _13, _14, _15, _16, = (_2).forward(_4, )
_17 = (_1).forward(_7, _8, _9, _10, _11, _12, _13, _14, _15, _4, _5, _6, )
File "code/**torch**/torchvision/models/detection/transform.py", line 12, in forward
_0 = torch.slice(mean, 0, 0, 9223372036854775807, 1)
_1 = torch.unsqueeze(torch.unsqueeze(_0, 1), 2)
_2 = torch.sub(image, _1, alpha=1)
~~~~~~~~~ <--- HERE
_3 = torch.slice(std, 0, 0, 9223372036854775807, 1)
_4 = torch.unsqueeze(torch.unsqueeze(_3, 1), 2)
Traceback of TorchScript, original code (most recent call last):
/home/kevin/anaconda3/envs/script/lib/python3.6/site-packages/torchvision/models/detection/transform.py(124): normalize
/home/kevin/anaconda3/envs/script/lib/python3.6/site-packages/torchvision/models/detection/transform.py(104): forward
/home/kevin/anaconda3/envs/script/lib/python3.6/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/kevin/anaconda3/envs/script/lib/python3.6/site-packages/torch/nn/modules/module.py(720): _call_impl
/home/kevin/anaconda3/envs/script/lib/python3.6/site-packages/torchvision/models/detection/generalized_rcnn.py(79): forward
/home/kevin/anaconda3/envs/script/lib/python3.6/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/kevin/anaconda3/envs/script/lib/python3.6/site-packages/torch/nn/modules/module.py(720): _call_impl
faster_rcnn_script.py(34): forward
/home/kevin/anaconda3/envs/script/lib/python3.6/site-packages/torch/nn/modules/module.py(704): _slow_forward
/home/kevin/anaconda3/envs/script/lib/python3.6/site-packages/torch/nn/modules/module.py(720): _call_impl
/home/kevin/anaconda3/envs/script/lib/python3.6/site-packages/torch/jit/**init**.py(1109): trace_module
/home/kevin/anaconda3/envs/script/lib/python3.6/site-packages/torch/jit/**init**.py(955): trace
faster_rcnn_script.py(17): do_trace
faster_rcnn_script.py(64): save_jit_model
faster_rcnn_script.py(89): main
faster_rcnn_script.py(93):
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
my code is:
cv::Mat pred_img = m_predict_img.clone();
#ifdef RGB
pred_img.convertTo(*m_img_float, CV_32FC3, 1.0 / 255.0);
#else
pred_img.convertTo(*m_img_float, CV_32FC1, 1.0 / 255.0);
#endif
#ifdef TRACE
torch::Tensor tensor_image = torch::from_blob(m_img_float->data, {1, m_img_float->rows,
m_img_float->cols, m_img_float->channels()}, torch::kF32);
tensor_image = tensor_image.permute({0, 3, 1, 2}); //trace
#else
torch::Tensor tensor_image = torch::from_blob(m_img_float->data, {m_img_float->rows,
m_img_float->cols, m_img_float->channels()}, torch::kF32);
tensor_image = tensor_image.permute({2, 0, 1}); //script
#endif
```
auto img_var = torch::autograd::make_variable(tensor_image, false);
std::vector<torch::jit::IValue> inputs;
torch::jit::IValue output;
if (check_gpu_available()) {
```
#ifdef TRACE
inputs.push_back(img_var.to(torch::kCUDA));
//inputs.push_back(img_var.to(torch::kCPU));
#else
inputs.push_back(c10::Listtorch::Tensor(img_var.to(torch::kCUDA)));
//inputs.push_back(c10::Listat::Tensor(img_var.to(torch::kCPU)));
#endif
std::cout << "before prediction!" << std::endl;
output = m_crop_module.forward(inputs);
std::cout << "after prediction!" << std::endl;
} else {
inputs.push_back(c10::Listat::Tensor({img_var.to(torch::kCPU)}));
output = m_crop_module.forward(inputs);
}
inputs.pop_back();
```
auto out = output.toTuple()->elements();
```
My environment is:
ubuntu20.04+libtorch1.6.0+torchvision0.7.0