Transfer uint8_t[] to at::Tensor@CUDA and back

I am writing a small c++ module, whose input and output interface are images in uint8_t[] format. Currently, I am performing a lot of conversion steps. Can this be done any more efficient?

  • at::Tensor t = torch::from_blob(uint8_t[] data_in, at::kByte)
  • t = t.to(at::kCUDA)
  • t = t.to(at::kFloat)
  • t = at::tranpose(t,1,2)
  • t = model.forward(std::vectortorch::jit::IValue {t}).toTensorList().get(0)
  • t = t.round()
  • t = at::tranpose(t,2,1)
  • t = t.to(at::kByte)
  • t = t.to(at::kCPU)
  • memcpy(uint8_t[] data_out, t.data_ptr(), t.numel()*sizeof(uint8_t))