Hi! I am rather new to PyTorch and I need to implement a parametrized model ansatz from inteferometer optics instead of the usual neural network ansatz.
The ansatz is called the Clements layout. It takes as input real-valued numbers (the phases), and produces a complex-valued matrix (unitary transformation).
The basic element it relies on is a 2x2-matrix:
def MZI(phases):
M = torch.tensor([[ torch.exp(1j*phases[0])*torch.cos(phases[1]) , -torch.sin(phases[1]) ], [ torch.exp(1j*phases[0])*torch.sin(phases[1]) , torch.cos(phases[1]) ]])
return M
These 2x2-matrices get used to define a big unitary matrix that serves as one layer:
def ClementsLayer(dim, withSkip , phases):
if withSkip == 0:
if dim % 2 == 0:
for k in range(dim//2):
if k == 0:
M = MZI(phases[0:2])
else:
M = torch.block_diag(M, MZI(phases[k : k+2]))
if dim % 2 == 1:
for k in range((dim-1)//2):
if k == 0:
M = MZI(phases[0:2])
else:
M = torch.block_diag(M, MZI(phases[k : k+2]))
M = torch.block_diag(M, torch.ones(size = [1,1]))
if withSkip == 1:
if dim % 2 == 0:
M = torch.ones(size = [1,1])
for k in range((dim-2)//2):
M = torch.block_diag(M, MZI(phases[k:k+2]))
M = torch.block_diag(M, torch.ones(size = [1,1]))
if dim % 2 == 1:
M=torch.ones(size=[1,1])
for k in range((dim-1)//2):
M = torch.block_diag(M , MZI(phases[k:k+2]))
return M
These layers get multiplied to obtain the final model ansatz:
def Clements(dim, ClementsPhases, outputPhases):
M = torch.eye(dim , dtype = torch.cfloat)
for k in range(dim):
M = torch.matmul(ClementsLayer(dim, withSkip = k % 2, phases = ClementsPhases[k,:]), M)
M = torch.matmul(torch.diag(torch.exp(1j*outputPhases)) ,M)
return M
Now, I am confronted with the problem that the backwards() and optimizer.step() don’t change the parameters in ClementsPhases. I did declare them as pytorch parameters using nn.Parameter().
I already tried to narrow down what the problem might be.
For a toy problem, I saw that the parameters in outputPhases do change, while the parameters in ClementsPhases don’t change.
If I use the command
for param in MyModel.parameters():
print(param.grad)
after an optimizer step, I get an output of the form:
None
tensor([-1.3310e-04, -5.3921e-04, -8.4813e-05, 2.1805e-02, -2.1048e-02])
Here, by parameter counting, the tensor with 5 numbers in it should be change for outputPhases, while the None then probably refers to ClementsPhases.
If I directly try the command
torch.autograd.functional.jacobian(Clements5 , ( torch.randn(size = [5,5] , dtype = torch.float32) , torch.randn(size = [5] , dtype = torch.float32)))
where Clements5 is a copy of Clements that has the dimension fixed to 5, I get an output of the form
(tensor( lots of zero-matrices), tensor(matrices that are non-zero exactly in one column each)
The second tensor is probably for outputPhases again (that would explain why exactly one column is non-zero). Then the many zero-matrices are probably for ClementsPhases.
So it appears to me that PyTorch cannot calculate gradients for my ClementsPhases parameters, and I don’t know why this might be the case. Do my if-statements or loops break the gradient graph or backwards() function? Or is something else going wrong?