Libtorch CPP Api for Memory Format Channels Last

Background

I am have been using the pytorch on Android.
As mentioned in the following article link, the processing is faster on channel_last format, and in a real world use case, I have also observed the same affect, where the inference is faster on the channels last memory format.
Now for both the Android and Python libraries, there is a function to convert the input tensor to channels_last memory format.
Python:

x = x.to(memory_format=torch.channels_last)

Android (Kotlin):

val inputTensor = TensorImageUtils.bitmapToFloat32Tensor(
                bitmap,
                IMAGE_MEAN,
                IMAGE_STD,
                MemoryFormat.CHANNELS_LAST
            )

However, I now have a requirement of running the model in the Android NDK. That means I need to implement the inference in C++ using the libtorch library. However, I can’t find the function to do the memory format change to channels last.

This is what I am currently doing is the following in libtorch in C++:

cv::Mat image = imgP.clone();
        auto tensorImage = torch::from_blob(image.data, {1, image.rows, image.cols, 3}, torch::kByte).to(torch::kFloat32).permute({0, 3, 1, 2}).div(255.0);
tensorImage = tensorImage.to(torch::MemoryFormat::ChannelsLast);

Question: Is my apporach in C++ correct?

Yes, transforming the inputs and parameters should work as shown here.