Is there anyway to process image tensors on GPU with OpenCV or any other library?

I use a network to get images in tensor type on GPU, but there are some postprocesses to do.
Now I’m using OpenCV with CUDA to conduct them.

cpu_numpy = gpu_tensor.byte().cpu().numpy()
gpu_mat = cv2.cuda_GpuMat()
gpu_mat.upload(cpu_numpy)

But it means image data is copied from GPU memory to memory and back.

Is there any method to process image tensors on GPU directly?