I have a model in which I create a rotational matrix as a function of angle and would like to have angle as a trainable parameter of the model. However, angle is not updated, although it has requires_grad=True.
def __init__(self,scale,center=None, angle=None):
if angle is None:
return torch.tensor([[torch.cos(angle), -torch.sin(angle)], [torch.sin(angle),
torch.cos(angle)]],requires_grad=True).view(1, 1, 2, 2)
What is the proper way to make sure that angle is being updated?
Recreating a tensor (in
rotational_matrix) will detach
self.angle from the computation graph, so you should use
A related question: Say I want to construct a tensor of the following form:
[[σ , 0, -κ],
[0, κ, 0],
[-κ, 0, λ⁴σ]])
where κ is a function of two
nn.Parameter L and σ and λ is a function of L. When I construct the tensor by using the parameters directly, it seems like the same thing happens, no gradient updates.
It works, however, when I construct single rows of the tensor (e.g.
[0, 1, 0] * κ for the second row) individually. Once I created all rows by multiplying with the Parameters and concatenating, I get gradient updates. Is this really the best way to construct such tensors? Seems like this leads to a slightly more complex computation graph that it needs to be?
I’m not sure if I understand the issue correctly, but recreating a tensor or parameter will detach it from the computation graph as seen in the original issue.
If you have already created the parameters, you would have to construct the matrix with
Could you post a small code snippet to see your workflow in case I misunderstood your question?
Sorry for not being precise enough. Consider the following example module:
def __init__(self, ls):
self.ls = nn.Parameter(torch.full((1,), ls))
def forward(self, X):
# Create matrix with some cells being
# dependent on `self.ls` via λ=√3/self.ls
# F = [ 0 1
# -λ², -2λ]
λ = torch.sqrt(torch.tensor([3.])) / self.ls
# The following line does not work (self.ls won't receive gradients)
F = torch.tensor([[0., 1.], [-1.*λ**2, -2*λ]])
# The following works but my question is if it is the best approach
# since it can become tedious for larger matrices to follow this approach:
F_0 = torch.tensor([[0., 1.]])
F_1 = torch.tensor([[-1., -2.]]) * λ
F_1 = F_1 * (torch.tensor([[1., 0.]]) * λ +
F = torch.cat((F_0, F_1), 0)
# Use F for computations with X
Thanks for the code snippet.
You are recreating a tensor indeed (
F in your code), which will detach
lambda from the computation graph. Instead you would need to use
torch.zeros, etc. to create