Libtorch: torch::tensor from cv::cuda::GpuMat will cause segment fault

I found if use cv::cuda::GpuMat to create tensor, it will cause a segmentfault.
my code:

cv::Mat image = cv::imread(argv[1], cv::IMREAD_COLOR);
cv::cuda::GpuMat image_device;
cv::cuda::GpuMat image_float_device;
image_device.upload(image);
image_device.convertTo(image_float_device, CV_32FC3);
at::Tensor input_tensor = torch::CUDA(torch::kFloat32).tensorFromBlob(img_float_device.data, {img_float_device.rows, img_float_device.cols, 3});
std::cout << "all finished!" << std::endl;
return 0;

The sentence “all finished!” could be print.
But it will cause a segment fault, I guess maybe due to the tensor destructor ?
How could I fix the problem?
Appreciate for any help~

anyone could help~!!!

See this:

I see no performance gain when using GpuMat vs using Mat.

@dambo we use cv::cuda::resize, it save some time.
And we run model in GPU, Therefore, it is most appropriate to generate tensor directly from the GPU, eliminating the operation “tensor.to(device)”.

From your answer, it seems that it does not explain why segmentfault.
Can you explain more in detail?

But did my code also cause a segmentation fault?

@dancedpipi What was your ultimate result? Were you able to fix your problem?

@dambo Am I also misunderstanding the from_blob? I was also under the impression that using the GpuMat should be more efficient than Mat when dealing with large datasets because you are avoiding Cpu to Gpu copies? Am I incorrect here? Would from_blob perform a copy within the GPU or would it change the way the data is referenced?