[c++] convertTo error with kCUDA device

I am making segmentation with libtorch.

Input is 2242243 and output will 2242241 mask image.
This is working well with CPU device, but when I try GPU it crashed at cv::convertTo function.
My code is like this.

torch::NoGradGuard no_grad;
torch::Tensor val_loss;
for (torch::data::Example<torch::Tensor, torch::Tensor>& batch : *data_loader_val) {
auto data = batch.data.to(device);
auto targets = batch.target.to(device);
try {
			torch::Tensor outputs = segmentation->forward(data);

			float* gt = targets.data_ptr<float>();
			float* pr = outputs.data_ptr<float>();
			char name[80];

			for (auto i = 0; i < outputs.sizes()[0]; i++)
						auto gttmp = gt + i * 224 * 224;
						cv::Mat gtmat(224, 224, CV_32FC1, gttmp);
						gtmat.convertTo(gtmat, CV_8UC1, 255);

						sprintf_s(name, "gt%d", i);
						cv::imshow(name, gtmat);

						auto prtmp = pr + i * 224 * 224;
						cv::Mat ptmat(224, 224, CV_32FC1, prtmp);
						ptmat.convertTo(ptmat, CV_8UC1, 255);

						sprintf_s(name, "pr%d", i);
						cv::imshow(name, ptmat);

Could I have a clue for this problem?

I would guess that cv::Mat is trying to access memory on GPU (gt and pr are both referring to CUDA memory). Before calling ‘data_ptr()’, copy outputs and tensors back to CPU, e.g.:

		targets = targets.cpu().contiguous();
		outputs = outputs.cpu().contiguous();
1 Like

Thank!!! I solved this problem!
Does contiguous function help copy cuda memory to cpu memory?
I tried targets.cpu(), outputs.cpu() but it doesn’t work.

Since you’re reading sequentially through the tensor’s memory, you need contigous tensor. There is a pretty good explanation what contigous does over here https://discuss.pytorch.org/t/contigious-vs-non-contigious-tensor/30107