Laying out vector values into a matrix - LDL decomposition

Hello,

I am predicting a set of parameters that generate the decomposition of positive definite matrix using the LDL decomposition.

Given a NxN matrix, instead of estimating the full matrices I decided to estimate the set of elements l and d necessary to re-compose the L and D matrices (e.g. 3 for L and 3 for D in the 3x3 case, note that this does not scale linearly for larger N).

While for D I just have to put d on the diagonal, the situation for L is a little more complex. In fact the size of L is a triangular number that can be obtained as -0.5+(0.25+2*l.size(1))**(1/2)+1. The matrix is a lower triangular matrix having ones on the main diagonal (see LDL decomposition for more info).

Now the problem lies in putting the components of the l vector in L.

I made a generic function that does that:

def batch_l_to_L(l_vec):

L_size = -0.5+(0.25+2*l_vec.size(1))**(1/2)
L_size = L_size + 1 
L = torch.eye(int(L_size))
L = L.unsqueeze(0)
L = L.repeat(l_vec.size(0),1,1)

for b in range(l_vec.size(0)):    
    it_count = 0
    for j in range(1,L.size(1)):
        for k in range(0, j):
            L[b,j,k] = l_vec[b,it_count]
            it_count = it_count + 1

return L

but I am not sure everything goes well wrt autograd.

Question 1: is this approach alright or do I do some operation that does not preserve the gradients?

I checked the graph using PyTorchViz and it comes out as a huge graph (of course encorporating the network and other parts of the loss but this is function contributing for the biggest part)

Then I did the manual layout for the case where N=3 which is a 3 lines code:

L = torch.eye(int(-0.5+(0.25+2*l.size(1))**(1/2))+1)
L = L.repeat(l.size(0), 1,1)
L[:,1,0] = l[:,0]
L[:,2,0] = l[:,1]
L[:,2,1] = l[:,2]

and in this case the graph is way smaller.

Question 2: why is it the case? Do I preserve gradients this time as well?

Edit: I think the difference in the graph comes from the fact that in the function I iterate over the batch dimension as well, while I use slices below. Still I would like to know if this the most efficient way and if this implementation causes any trouble to autograd.