Strange behavior of torch::Stack

Hi everyone, I’m using Pytorch for a project that involves OpenGL/Cuda interop and OpenCV (for visualization).

I’m quite new to Pytorch so I might be missing something but I’m having a problem with the stack function.

Basically, I’m rendering an image (640X640X4) with OpenGL and I’m loading it into a tensor on GPU.

I managed to visualize it properly by converting it to a BGRA matrice with OpenCV using data_ptr.

Then I split the image into 100 non-overlapping patches of (64X64X4) with index /slice operations :

torch::Tensor tensors_stacked; 
std::vector<torch::Tensor> patches;
for (int i = 0; i < numPatchesY; ++i) {
    for (int j = 0; j < numPatchesX; ++j) {
        torch::Tensor patch = image_tensor.index({0, 
                              torch::indexing::Slice(),
                              torch::indexing::Slice(i * patchSize, (i + 1) * patchSize),
                              torch::indexing::Slice(j * patchSize, (j + 1) * patchSize)}).clone(); 
         //Visualizing the patch tensor here give expected results.
        patches.push_back(patch);
    }
}
//Then I stack each patch into a higher dimension tensor.
tensors_stacked = torch::stack(patches);

//Visualizing the patch tensor tensors_stacked[0] here gives weird patterns.


My problem is that the patch resulting from the stacking operation doesn’t seem to be the same from the slicing as you can see now :
Visualizing the patch after slicing => (new user restrictions - can only put one image) Everything is fine !
Visualizing the corresponding patch after stacking =>
model_25_view_-1

I’ve checked min/max value and the Dtype of the Input image_tensor / patch and tensor_stacked[0] which are the same.
I’ve tried to change the memory layout (contiguous / channel_first etc but nothing did the trick).

Regarding the strides and sizes :

  • My input image/tensor image_tensor has sizes of [1,4,640,640] and strides [1638400,1,2560,4]
  • The patch created after the slice operation has sizes of [4,64,64] and strides [4096,64,1]
  • Tensor_stacked has sizes of [100,4,64,64] and strides [64,1,4096]
  • Finally, Tensor_stacked[0] has sizes [4,64,64] and strides [4096,64,1]

Like I said, after the stacking operation I’m not able to display the patch properly and I don’t understand why this is not working.

I gladly thank you if you have any explanation for what going on here!

The output looks interleaved and might be caused by the actual code to visualize the patch so could you share it, please?
A quick test in Python shows that your general slicing is correct:

x = torch.randn(1, 4, 640, 640)
stride = 64

patches = []
for i in range(10):
    for j in range(10):
        patch = x[:, :, i*stride : (i+1)*stride, j*stride : (j+1)*stride]
        patches.append(patch)

stacked = torch.stack(patches)
print(stacked.shape)
# torch.Size([100, 1, 4, 64, 64])

# test
print((stacked[0] == x[:, :, :stride, :stride]).all())
# tensor(True)
print((stacked[-1] == x[:, :, x.size(2)-stride:, x.size(3)-stride:]).all())
# tensor(True)

Thank you for the test.

Operations done on tensor before visualizing :


for(int i = 0; i < tensors_stacked.sizes()[0];i++) {
    //Tensor values are between [-1,1] for model purpose. Converting back to [0,1] 
    torch::Tensor debug_print = (tensors_stacked[i].clone()*0.5)+0.5;
    //Going from [4,64,64] to [64,64,4]
    debug_print = debug_print.permute({ 1,2,0 });
    //Adding one extra dim [1,64,64,4] to match function input format
    debug_print = debug_print.view({1, 64, 64, 4});
    saveTensorToImage( debug_print, "debug/",i ,-1);  
}

Note : Patches from the slicing operation (before stacking) are displayed well using the same transformations. So I assume the normalisation / permutation / view are safe.

Now the part related to visualising :

void saveTensorToImage(const torch::Tensor &texture_tensor, std::string folder, int current_token, int counter_frames) {
    //Sending data to the CPU and removing the extra dim 
    torch::Tensor tensor_cpu = texture_tensor.clone().to(torch::kCPU)[0];	
    
    cv::Mat save_img;
    save_img = ToCvImage(tensor_cpu);
    if(save_img.empty())
    {
      std::cerr << "Something is wrong with the image, could not get frame." << std::endl;
    }

    // Save the frame into a file
    char tmp[4096];
    sprintf(tmp,"%s%s%02d_view_%02d.png",folder.c_str(),"/model_",current_token,counter_frames);
    cv::Mat bgra8;
    save_img.convertTo(bgra8, CV_8UC4, 255); // [0,1] float image to [0,255] int
    cv::imwrite(tmp, bgra8); //saving a png file
}

//Convert and RGBA float tensor to an BGRA image ! 
cv::Mat ToCvImage(const torch::Tensor &tensor){
    int height = tensor.size(0);
    int width = tensor.size(1);
try {
    // Assuming the tensor has 4 channels (RGBA)
    int channels = tensor.size(2);

    // Check the data type of the tensor
    at::ScalarType dtype = tensor.scalar_type();
    cv::Mat output_mat;
    if (dtype == torch::kFloat) {
        if (channels == 4) {
            // Four channels (RGBA)
            output_mat = cv::Mat(height, width, CV_32FC4, tensor.data_ptr<float>());
            std::vector<cv::Mat> channels(4);
	    cv::split(output_mat, channels);

	    // Permute the channels (swap Red and Blue) RGB => BGR
	    std::swap(channels[0], channels[2]);

	    // Merge the channels back to form the BGR Mat
	    cv::Mat bgrMat;
	    cv::merge(channels, bgrMat);
	    output_mat = bgrMat.clone();
            } else {
                // Unsupported number of channels
                std::cerr << "Unsupported number of channels: " << channels << std::endl;
                return cv::Mat();
            }
        } else {
            // Unsupported data type
            std::cerr << "Unsupported data type: " << dtype << std::endl;
            return cv::Mat();
        }
    return output_mat;
} catch (const c10::Error& e) {
    std::cout << "An error has occurred: " << e.msg() << std::endl;
}
    return cv::Mat();  // Return an empty Mat on failure
}

Again, thank you for your time !