Hello,

I am predicting a set of parameters that generate the decomposition of positive definite matrix using the LDL decomposition.

Given a NxN matrix, instead of estimating the full matrices I decided to estimate the set of elements `l`

and `d`

necessary to re-compose the L and D matrices (e.g. 3 for `L`

and 3 for `D`

in the 3x3 case, note that this does not scale linearly for larger `N`

).

While for `D`

I just have to put `d`

on the diagonal, the situation for `L`

is a little more complex. In fact the size of `L`

is a triangular number that can be obtained as `-0.5+(0.25+2*l.size(1))**(1/2)+1`

. The matrix is a lower triangular matrix having ones on the main diagonal (see LDL decomposition for more info).

Now the problem lies in putting the components of the `l`

vector in `L`

.

I made a generic function that does that:

def batch_l_to_L(l_vec):

`L_size = -0.5+(0.25+2*l_vec.size(1))**(1/2) L_size = L_size + 1 L = torch.eye(int(L_size)) L = L.unsqueeze(0) L = L.repeat(l_vec.size(0),1,1) for b in range(l_vec.size(0)): it_count = 0 for j in range(1,L.size(1)): for k in range(0, j): L[b,j,k] = l_vec[b,it_count] it_count = it_count + 1`

return L

but I am not sure everything goes well wrt autograd.

*Question 1: is this approach alright or do I do some operation that does not preserve the gradients?*

I checked the graph using PyTorchViz and it comes out as a huge graph (of course encorporating the network and other parts of the loss but this is function contributing for the biggest part)

Then I did the manual layout for the case where `N=3`

which is a 3 lines code:

`L = torch.eye(int(-0.5+(0.25+2*l.size(1))**(1/2))+1)`

`L = L.repeat(l.size(0), 1,1)`

`L[:,1,0] = l[:,0]`

`L[:,2,0] = l[:,1]`

`L[:,2,1] = l[:,2]`

and in this case the graph is way smaller.

*Question 2: why is it the case? Do I preserve gradients this time as well?*

Edit: I think the difference in the graph comes from the fact that in the function I iterate over the batch dimension as well, while I use slices below. Still I would like to know if this the most efficient way and if this implementation causes any trouble to autograd.