Hello,
I’m running a TorchScript model that takes a 3-channel 640x640 float image as input, normalized to the range 0.0−1.00.0 - 1.00.0−1.0. However, my input is a single-channel uint8 image.
Initially, I used the following code to preprocess the image:
cv::Mat& image = images[i];
image.convertTo(image, CV_32F, 1.0 / 255.0);
torch::Tensor tensor_image = torch::from_blob(image.data, { height, width, channels }, torch::kFloat).to(device);
tensor_image = tensor_image.permute({ 2, 0, 1 });
tensor_images.push_back(tensor_image.clone());
This implementation works correctly, but it’s slow. To improve performance, I modified it to first send the original image to the GPU and then process it:
cv::Mat single_channel;
cv::extractChannel(image, single_channel, 0);
torch::Tensor tensor_image = torch::from_blob(single_channel.data, { height, width }, torch::kByte);
tensor_image = tensor_image.to(device, torch::kFloat).div_(255.0);
tensor_image = tensor_image.unsqueeze(0).expand({ 3, height, width });
tensor_images.push_back(tensor_image);
This approach is significantly faster. However, when running it in my multi-threaded system, it crashes.
Do you have any insights or suggestions on how to resolve this issue?
I appreciate your help in advance!