Diagonal repetition of a matrix?

Hi,
Given m*n input matrix (tensor),

A B
C D

I want to repeat it diagonally, into (m*p)*(n+p-1) matrix

A B 0 0
0 A B 0
0 0 A B
C D 0 0
0 C D 0
0 0 C D

Where (m,n,p) are provided during init().
How to do it in Pytorch with autograd support? Thank you!

Ok, so I couldn’t find a super easy or efficient way to do this, and if people have better solutions I’d love to see them, but this is what I came up with:

def block_diag(x, n):
    """Repeats a tensor diagonally n times as specified by @zfzhang """
    outputs = []
    for row in x:
        out = torch.zeros(n, n + x.shape[1] - 1)
        for ii, elem in enumerate(row):
            d = torch.diag(elem.expand(n))
            padded = torch.nn.functional.pad(d, pad=(ii, x.shape[1] - ii - 1))
            out = out + padded
        outputs.append(out)
    return torch.cat(outputs)

This method works, but is… not fast:

In [1]: x = torch.rand(10, 10)

In [2]: %timeit block_diag(x, 2)
1.78 ms ± 53.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [3]: %timeit block_diag(x, 10)
1.76 ms ± 60.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [4]: x = torch.rand(50, 50)

In [5]: %timeit block_diag(x, 2)
43.2 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: %timeit block_diag(x, 10)
44.9 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The method should work with autograd, but may be infeasibly slow, depending on the size of your matrix! Hope that helps a bit. Again, if someone else has a better solution, please share, I’m very curious!