Strange behavior of torch::Stack

Duriandyl · April 17, 2024, 4:33pm

Hi everyone, I’m using Pytorch for a project that involves OpenGL/Cuda interop and OpenCV (for visualization).

I’m quite new to Pytorch so I might be missing something but I’m having a problem with the stack function.

Basically, I’m rendering an image (640X640X4) with OpenGL and I’m loading it into a tensor on GPU.

I managed to visualize it properly by converting it to a BGRA matrice with OpenCV using data_ptr.

Then I split the image into 100 non-overlapping patches of (64X64X4) with index /slice operations :

torch::Tensor tensors_stacked; 
std::vector<torch::Tensor> patches;
for (int i = 0; i < numPatchesY; ++i) {
    for (int j = 0; j < numPatchesX; ++j) {
        torch::Tensor patch = image_tensor.index({0, 
                              torch::indexing::Slice(),
                              torch::indexing::Slice(i * patchSize, (i + 1) * patchSize),
                              torch::indexing::Slice(j * patchSize, (j + 1) * patchSize)}).clone(); 
         //Visualizing the patch tensor here give expected results.
        patches.push_back(patch);
    }
}
//Then I stack each patch into a higher dimension tensor.
tensors_stacked = torch::stack(patches);

//Visualizing the patch tensor tensors_stacked[0] here gives weird patterns.

My problem is that the patch resulting from the stacking operation doesn’t seem to be the same from the slicing as you can see now :
Visualizing the patch after slicing => (new user restrictions - can only put one image) Everything is fine !
Visualizing the corresponding patch after stacking =>
model_25_view_-1

I’ve checked min/max value and the Dtype of the Input image_tensor / patch and tensor_stacked[0] which are the same.
I’ve tried to change the memory layout (contiguous / channel_first etc but nothing did the trick).

Regarding the strides and sizes :

My input image/tensor image_tensor has sizes of [1,4,640,640] and strides [1638400,1,2560,4]
The patch created after the slice operation has sizes of [4,64,64] and strides [4096,64,1]
Tensor_stacked has sizes of [100,4,64,64] and strides [64,1,4096]
Finally, Tensor_stacked[0] has sizes [4,64,64] and strides [4096,64,1]

Like I said, after the stacking operation I’m not able to display the patch properly and I don’t understand why this is not working.

I gladly thank you if you have any explanation for what going on here!

ptrblck · April 18, 2024, 4:10am

The output looks interleaved and might be caused by the actual code to visualize the patch so could you share it, please?
A quick test in Python shows that your general slicing is correct:

x = torch.randn(1, 4, 640, 640)
stride = 64

patches = []
for i in range(10):
    for j in range(10):
        patch = x[:, :, i*stride : (i+1)*stride, j*stride : (j+1)*stride]
        patches.append(patch)

stacked = torch.stack(patches)
print(stacked.shape)
# torch.Size([100, 1, 4, 64, 64])

# test
print((stacked[0] == x[:, :, :stride, :stride]).all())
# tensor(True)
print((stacked[-1] == x[:, :, x.size(2)-stride:, x.size(3)-stride:]).all())
# tensor(True)

Duriandyl · April 18, 2024, 9:24am

Thank you for the test.

Operations done on tensor before visualizing :


for(int i = 0; i < tensors_stacked.sizes()[0];i++) {
    //Tensor values are between [-1,1] for model purpose. Converting back to [0,1] 
    torch::Tensor debug_print = (tensors_stacked[i].clone()*0.5)+0.5;
    //Going from [4,64,64] to [64,64,4]
    debug_print = debug_print.permute({ 1,2,0 });
    //Adding one extra dim [1,64,64,4] to match function input format
    debug_print = debug_print.view({1, 64, 64, 4});
    saveTensorToImage( debug_print, "debug/",i ,-1);  
}

Note : Patches from the slicing operation (before stacking) are displayed well using the same transformations. So I assume the normalisation / permutation / view are safe.

Now the part related to visualising :

void saveTensorToImage(const torch::Tensor &texture_tensor, std::string folder, int current_token, int counter_frames) {
    //Sending data to the CPU and removing the extra dim 
    torch::Tensor tensor_cpu = texture_tensor.clone().to(torch::kCPU)[0];	
    
    cv::Mat save_img;
    save_img = ToCvImage(tensor_cpu);
    if(save_img.empty())
    {
      std::cerr << "Something is wrong with the image, could not get frame." << std::endl;
    }

    // Save the frame into a file
    char tmp[4096];
    sprintf(tmp,"%s%s%02d_view_%02d.png",folder.c_str(),"/model_",current_token,counter_frames);
    cv::Mat bgra8;
    save_img.convertTo(bgra8, CV_8UC4, 255); // [0,1] float image to [0,255] int
    cv::imwrite(tmp, bgra8); //saving a png file
}

//Convert and RGBA float tensor to an BGRA image ! 
cv::Mat ToCvImage(const torch::Tensor &tensor){
    int height = tensor.size(0);
    int width = tensor.size(1);
try {
    // Assuming the tensor has 4 channels (RGBA)
    int channels = tensor.size(2);

    // Check the data type of the tensor
    at::ScalarType dtype = tensor.scalar_type();
    cv::Mat output_mat;
    if (dtype == torch::kFloat) {
        if (channels == 4) {
            // Four channels (RGBA)
            output_mat = cv::Mat(height, width, CV_32FC4, tensor.data_ptr<float>());
            std::vector<cv::Mat> channels(4);
	    cv::split(output_mat, channels);

	    // Permute the channels (swap Red and Blue) RGB => BGR
	    std::swap(channels[0], channels[2]);

	    // Merge the channels back to form the BGR Mat
	    cv::Mat bgrMat;
	    cv::merge(channels, bgrMat);
	    output_mat = bgrMat.clone();
            } else {
                // Unsupported number of channels
                std::cerr << "Unsupported number of channels: " << channels << std::endl;
                return cv::Mat();
            }
        } else {
            // Unsupported data type
            std::cerr << "Unsupported data type: " << dtype << std::endl;
            return cv::Mat();
        }
    return output_mat;
} catch (const c10::Error& e) {
    std::cout << "An error has occurred: " << e.msg() << std::endl;
}
    return cv::Mat();  // Return an empty Mat on failure
}

Again, thank you for your time !