Hello,
In my model i do this operation a matrix multiplication in the forward
’ new_D = ((torch.conj(M.T) @ self.W )/ torch.linalg.norm(torch.conj(M.T) @ self.W,dim=0)).type(torch.complex128)
I put it through the layers, where W is the only thing learned. But, I keep getting the same loss function.
then I use this new_D in the forward pass, note that M is changed from batch to another because I am performing an online training but self.W is the same the behavior of the cost function should decrease as i am updating the weights, but here is the results:
Yes, the gradient is calculated, but the issue is that when I train with numerous batches, the cost remains relatively stable. The outcome of the training ends up like this:
Otherwise if I want to do like this : where my input = x @ torch.conj( M ) and the weights are multiplied by the same M did this affect the backpropagation ? As I said I am performing an online learning, thank you in advance!
class MyModule(nn.Module):
def __init__(self, W_init: Tensor, normalize: bool = True) -> None:
super().__init__()
if normalize:
self.W = nn.Parameter(torch.tensor(W_init/np.linalg.norm(W_init,axis=0),dtype=torch.complex128))
else:
self.W = nn.Parameter(torch.tensor(W_init,dtype=torch.complex128))
def forward(self, M: Tensor,x: Tensor) -> Tensor:
new_D = ((torch.conj(M.T) @ self.W )/ torch.linalg.norm(torch.conj(M.T) @ self.W,dim=0)).type(torch.complex128)
out =( new_D @ x ).mean()
return out
w_init = torch.randn(2, 2)
module = MyModule(w_init)
optimizer=optim.Adam(module.parameters(),lr=0.01,weight_decay=0.5)
for i in range(10):
optimizer.zero_grad()
x = torch.randn(2, 2).to(torch.complex128)
M = torch.randn(2, 2).to(torch.complex128)
input = x @ M
out = module(input ,M)
optimizer.step()
out.norm().backward()
'''for name, param in module.named_parameters():
# Vérifiez si le paramètre a un gradient non nul
if param.grad is not None:
print(f'Paramètre : {name}, Gradient : \n{param.grad}')
else:
print(f'Paramètre : {name}, Pas de gradient calculé') '''
print('out',out.norm())
Every operation using a trainable parameter will be tracked by Autograd and will influence the gradients. If you are using self.W in any operation, this parameter will receive gradients and will then be updated assuming it was passed to an optimizer.
Yes, I appreciate the explanation.
However, in my scenario, I encounter a recurring issue where i multiply in each batch by a different matrix M, and with the same M i multiply with the weights, but my cost function either it remains constant or decrease slightly .