How to write a C++ extension operator that returns the same view

I want to write a custom operator in C++/Cuda that takes the input tensor’s values and puts them into the output tensor with a complex rule. The output tensor is usually a lot larger with many duplications, so creating an empty tensor and copying the values is not memory-efficient.

The desired behavior is somewhat similar to torch.broadcast_to:

x = torch.ones(3, 1)
y = torch.ops.aten.broadcast_to(x, (3, 5))
y[0, 0] = 0.5
print('y:', y)
print('x:', x)


y: tensor([[0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
           [1.0000, 1.0000, 1.0000, 1.0000, 1.0000],
           [1.0000, 1.0000, 1.0000, 1.0000, 1.0000]])
x: tensor([[0.5000],

Can someone please provide a tutorial on writing a C++/Cuda extension that returns the same memory view? Or is there a Pythonic solution?