How to cast (not convert) data type of a tensor in C++?

I’m implementing a custom Cuda operation that involves typecasting.
What I want to achieve, if written in python, is similar to:

>>> a = torch.ones(10).to('cuda')
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], device='cuda:0')
>>> a.dtype
torch.float32
>>> MyModule.myCustomOp(a, scaling_factor=100)
>>> a
tensor([100, 100, 100, 100, 100, 100, 100, 100, 100, 100], device='cuda:0', dtype=torch.int32)

a simplified version of my c++ code looks like the following:

__global__ void my_kernel(float* input, float scaling_factor) {
  auto index = blockIdx.x * blockDim.x + threadIdx.x;
  auto integer = int32_t(round(input[index] * scaling_factor));
  // some other logics, not important here
  int32_t* int_arr = (int32_t*) input;
  int_arr[index] = integer;
}

void myCustomOp(torch::Tensor input, float scaling_factor) {
  my_kernel<<<GRID_DIM, BLOCK_DIM>>>(input.data_ptr<float>(), scaling_factor);
  CUDA_CHECK(cudaDeviceSynchronize());
  // how to cast the type of input to int32_t?
}

My implementation does not change dtype of the input, so I get a meaningless torch.float32 tensor.
An alternative way is to create a new tensor and return it, but I’d like to do this in-place if possible.

Any suggestions, please?