TorchScript Model Performance and Multi-Threading Issue

Hello,

I’m running a TorchScript model that takes a 3-channel 640x640 float image as input, normalized to the range 0.0−1.00.0 - 1.00.0−1.0. However, my input is a single-channel uint8 image.

Initially, I used the following code to preprocess the image:

	cv::Mat& image = images[i];
	image.convertTo(image, CV_32F, 1.0 / 255.0);  
	torch::Tensor tensor_image = torch::from_blob(image.data, { height, width, channels }, torch::kFloat).to(device);
	tensor_image = tensor_image.permute({ 2, 0, 1 });  
	tensor_images.push_back(tensor_image.clone());  

This implementation works correctly, but it’s slow. To improve performance, I modified it to first send the original image to the GPU and then process it:

	cv::Mat single_channel;
	cv::extractChannel(image, single_channel, 0);
	torch::Tensor tensor_image = torch::from_blob(single_channel.data, { height, width }, torch::kByte);
	tensor_image = tensor_image.to(device, torch::kFloat).div_(255.0);
	tensor_image = tensor_image.unsqueeze(0).expand({ 3, height, width });
	tensor_images.push_back(tensor_image);

This approach is significantly faster. However, when running it in my multi-threaded system, it crashes.

Do you have any insights or suggestions on how to resolve this issue?

I appreciate your help in advance!