I am writing a small c++ module, whose input and output interface are images in uint8_t[] format. Currently, I am performing a lot of conversion steps. Can this be done any more efficient?
- at::Tensor t = torch::from_blob(uint8_t[] data_in, at::kByte)
- t = t.to(at::kCUDA)
- t = t.to(at::kFloat)
- t = at::tranpose(t,1,2)
- t = model.forward(std::vectortorch::jit::IValue {t}).toTensorList().get(0)
- t = t.round()
- t = at::tranpose(t,2,1)
- t = t.to(at::kByte)
- t = t.to(at::kCPU)
- memcpy(uint8_t[] data_out, t.data_ptr(), t.numel()*sizeof(uint8_t))