Does this parametrized model ansatz from interferometric optics break the graph or backwards()?

Hi! I am rather new to PyTorch and I need to implement a parametrized model ansatz from inteferometer optics instead of the usual neural network ansatz.

The ansatz is called the Clements layout. It takes as input real-valued numbers (the phases), and produces a complex-valued matrix (unitary transformation).

The basic element it relies on is a 2x2-matrix:

def MZI(phases):
    M = torch.tensor([[ torch.exp(1j*phases[0])*torch.cos(phases[1]) , -torch.sin(phases[1]) ], [ torch.exp(1j*phases[0])*torch.sin(phases[1]) , torch.cos(phases[1]) ]])
    return M

These 2x2-matrices get used to define a big unitary matrix that serves as one layer:

def ClementsLayer(dim, withSkip , phases):
    if withSkip == 0:
        if dim % 2 == 0:
            for k in range(dim//2):
                if k == 0:
                    M = MZI(phases[0:2])
                else:
                    M = torch.block_diag(M, MZI(phases[k : k+2]))
        if dim % 2 == 1:
            for k in range((dim-1)//2):
                if k == 0:
                    M = MZI(phases[0:2])
                else:
                    M = torch.block_diag(M, MZI(phases[k : k+2]))
            M = torch.block_diag(M, torch.ones(size = [1,1]))
    if withSkip == 1:
        if dim % 2 == 0:
            M = torch.ones(size = [1,1])
            for k in range((dim-2)//2):
                M = torch.block_diag(M, MZI(phases[k:k+2]))
            M = torch.block_diag(M, torch.ones(size = [1,1]))
        if dim % 2 == 1:
            M=torch.ones(size=[1,1])
            for k in range((dim-1)//2):
                M = torch.block_diag(M , MZI(phases[k:k+2]))
    return M

These layers get multiplied to obtain the final model ansatz:

def Clements(dim, ClementsPhases, outputPhases):
    M = torch.eye(dim , dtype = torch.cfloat)
    for k in range(dim):
        M = torch.matmul(ClementsLayer(dim, withSkip = k % 2, phases = ClementsPhases[k,:]), M)
    M = torch.matmul(torch.diag(torch.exp(1j*outputPhases)) ,M)
    return M

Now, I am confronted with the problem that the backwards() and optimizer.step() don’t change the parameters in ClementsPhases. I did declare them as pytorch parameters using nn.Parameter().

I already tried to narrow down what the problem might be.
For a toy problem, I saw that the parameters in outputPhases do change, while the parameters in ClementsPhases don’t change.

If I use the command

for param in MyModel.parameters():
                print(param.grad)

after an optimizer step, I get an output of the form:

None
tensor([-1.3310e-04, -5.3921e-04, -8.4813e-05,  2.1805e-02, -2.1048e-02])

Here, by parameter counting, the tensor with 5 numbers in it should be change for outputPhases, while the None then probably refers to ClementsPhases.

If I directly try the command

torch.autograd.functional.jacobian(Clements5 , ( torch.randn(size = [5,5] , dtype = torch.float32) , torch.randn(size = [5] , dtype = torch.float32)))

where Clements5 is a copy of Clements that has the dimension fixed to 5, I get an output of the form

(tensor( lots of zero-matrices), tensor(matrices that are non-zero exactly in one column each)

The second tensor is probably for outputPhases again (that would explain why exactly one column is non-zero). Then the many zero-matrices are probably for ClementsPhases.

So it appears to me that PyTorch cannot calculate gradients for my ClementsPhases parameters, and I don’t know why this might be the case. Do my if-statements or loops break the gradient graph or backwards() function? Or is something else going wrong?