Hi! I am rather new to PyTorch and I need to implement a parametrized model ansatz from inteferometer optics instead of the usual neural network ansatz.
The ansatz is called the Clements layout. It takes as input real-valued numbers (the phases), and produces a complex-valued matrix (unitary transformation).
The basic element it relies on is a 2x2-matrix:
def MZI(phases): M = torch.tensor([[ torch.exp(1j*phases)*torch.cos(phases) , -torch.sin(phases) ], [ torch.exp(1j*phases)*torch.sin(phases) , torch.cos(phases) ]]) return M
These 2x2-matrices get used to define a big unitary matrix that serves as one layer:
def ClementsLayer(dim, withSkip , phases): if withSkip == 0: if dim % 2 == 0: for k in range(dim//2): if k == 0: M = MZI(phases[0:2]) else: M = torch.block_diag(M, MZI(phases[k : k+2])) if dim % 2 == 1: for k in range((dim-1)//2): if k == 0: M = MZI(phases[0:2]) else: M = torch.block_diag(M, MZI(phases[k : k+2])) M = torch.block_diag(M, torch.ones(size = [1,1])) if withSkip == 1: if dim % 2 == 0: M = torch.ones(size = [1,1]) for k in range((dim-2)//2): M = torch.block_diag(M, MZI(phases[k:k+2])) M = torch.block_diag(M, torch.ones(size = [1,1])) if dim % 2 == 1: M=torch.ones(size=[1,1]) for k in range((dim-1)//2): M = torch.block_diag(M , MZI(phases[k:k+2])) return M
These layers get multiplied to obtain the final model ansatz:
def Clements(dim, ClementsPhases, outputPhases): M = torch.eye(dim , dtype = torch.cfloat) for k in range(dim): M = torch.matmul(ClementsLayer(dim, withSkip = k % 2, phases = ClementsPhases[k,:]), M) M = torch.matmul(torch.diag(torch.exp(1j*outputPhases)) ,M) return M
Now, I am confronted with the problem that the backwards() and optimizer.step() don’t change the parameters in ClementsPhases. I did declare them as pytorch parameters using nn.Parameter().
I already tried to narrow down what the problem might be.
For a toy problem, I saw that the parameters in outputPhases do change, while the parameters in ClementsPhases don’t change.
If I use the command
for param in MyModel.parameters(): print(param.grad)
after an optimizer step, I get an output of the form:
None tensor([-1.3310e-04, -5.3921e-04, -8.4813e-05, 2.1805e-02, -2.1048e-02])
Here, by parameter counting, the tensor with 5 numbers in it should be change for outputPhases, while the None then probably refers to ClementsPhases.
If I directly try the command
torch.autograd.functional.jacobian(Clements5 , ( torch.randn(size = [5,5] , dtype = torch.float32) , torch.randn(size =  , dtype = torch.float32)))
where Clements5 is a copy of Clements that has the dimension fixed to 5, I get an output of the form
(tensor( lots of zero-matrices), tensor(matrices that are non-zero exactly in one column each)
The second tensor is probably for outputPhases again (that would explain why exactly one column is non-zero). Then the many zero-matrices are probably for ClementsPhases.
So it appears to me that PyTorch cannot calculate gradients for my ClementsPhases parameters, and I don’t know why this might be the case. Do my if-statements or loops break the gradient graph or backwards() function? Or is something else going wrong?