Exporting Custom Cuda Op to Onnx

I have custom cuda op that I want to export to onnx and later tensorRT. Original implementation of this op accepted input tensors as references:

void my_custom_cuda_op(
    bool attribute_1,
    const torch::Tensor &input_1, const torch::Tensor &input_2, const torch::Tensor &input_2, torch::Tensor &output)
{

When exporting to onnx it does not trace the op properly. If I don’t pass the output tensor but create it within the op, then op shows up in onnx graph. I guess that issue is that op is inplace?