Background
I am have been using the pytorch on Android.
As mentioned in the following article link, the processing is faster on channel_last format, and in a real world use case, I have also observed the same affect, where the inference is faster on the channels last memory format.
Now for both the Android and Python libraries, there is a function to convert the input tensor to channels_last memory format.
Python:
x = x.to(memory_format=torch.channels_last)
Android (Kotlin):
val inputTensor = TensorImageUtils.bitmapToFloat32Tensor(
bitmap,
IMAGE_MEAN,
IMAGE_STD,
MemoryFormat.CHANNELS_LAST
)
However, I now have a requirement of running the model in the Android NDK. That means I need to implement the inference in C++ using the libtorch library. However, I can’t find the function to do the memory format change to channels last.
This is what I am currently doing is the following in libtorch in C++:
cv::Mat image = imgP.clone();
auto tensorImage = torch::from_blob(image.data, {1, image.rows, image.cols, 3}, torch::kByte).to(torch::kFloat32).permute({0, 3, 1, 2}).div(255.0);
tensorImage = tensorImage.to(torch::MemoryFormat::ChannelsLast);