Android pytorch model latency

I am developing with reference to the pytorch object detection example.
I am trying to detect an object using pytorch model in android.
It receives an image in real time, receives this image as a bitmap, and displays a specific object as the result.

However, the TensorImageUtils.bitmapToFloat32Tensor function is time consuming and very slow compared to live images.

bitmap = Bitmap.createBitmap(bmp, 0, 0, bmp.width, bmp.height, matrix, true)
        var resizedBitmap = Bitmap.createScaledBitmap(bitmap,PrePostProcessor.mInputWidth,PrePostProcessor.mInputHeight,true)
        inputTensor = TensorImageUtils.bitmapToFloat32Tensor(resizedBitmap,PrePostProcessor.NO_MEAN_RGB,PrePostProcessor.NO_STD_RGB)
        outputTuple  = mModule!!.forward(IValue.from(inputTensor)).toTuple()
        outputTensor = (outputTuple as Array<out IValue>?)?.get(0)!!.toTensor()
        output = outputTensor!!.dataAsFloatArray

Is there any way to solve this? Should I force it to use the gpu? If so, how?

How large is your image? Do you have any latency number?
bitmapToFloat32Tensor is a function from torchvision. I don’t think it can run on GPU

bitmapToFloat32 function takes about 3 seconds

mModule!!.forward(IValue.from(inputTensor)).toTuple() function takes about 1 seconds

image size 640 * 640