Indexing into a recursively generated "tensor of tensors"

I would like to be able to fill in elements of a tensor recursively, and then compute the gradient w.r.t. an element for backpropagation. Currently, my code (using Python looks something like this:

def function(length: int, idx: Tensor):
    _dummy: Tensor = torch.empty(0)
    x: List[Tensor] = [_dummy for _ in range(length)]
    x[0] = torch.zeros(batch_size)
    for i in range(1, length):
        x[i] = some_fn(x[i - 1])
    return tensor([x[i] for i in idx.tolist()])  # Slow step!

However, this is slower on GPU than on CPU, and I suspect that this is due to the use of Python lists needing to synchronize between CPU/GPU (given that RNN training works just fine). Given that I already know the length of my array and the indices I need in advance, I’d like to be able to do something more akin to this:

def vectorized_function(length: int, idx: Tensor):
    t: Tensor = torch.empty(batch_size, length)
    t[:, 0] = 0
    for i in range(1, length):
        t[:, i] = some_fn(t[:, i - 1])
    return t.index_select(idx)

However, as I understand it, this won’t work because the gradient of t cannot be computed with in-place operations. Is there any way around this?

Are you getting an “inplace error” or is this just a concern?
If inplace operations are disallowed in your code, PyTorch will raise an error, so you could still try to execute your code given that t is a newly initialized tensor.

I’m an idiot, I think something else was broken the first time I tried it and after refactoring it works. Thank you so much!