Model parameter not updated

markorei94 · December 6, 2019, 4:22pm

I have a model in which I create a rotational matrix as a function of angle and would like to have angle as a trainable parameter of the model. However, angle is not updated, although it has requires_grad=True.

class Rotation(Affine):
    def __init__(self,scale,center=None, angle=None):
        super().__init__(scale,center)
        if angle is None:
          self.angle=nn.Parameter(torch.rand(1))
        else: self.angle=nn.Parameter(torch.tensor(angle,dtype=torch.float))
        self.shift=torch.zeros((1,1,2,1))

    def forward(self,xy):
        def rotational_matrix(angle):
            return torch.tensor([[torch.cos(angle), -torch.sin(angle)], [torch.sin(angle), 
                       torch.cos(angle)]],requires_grad=True).view(1, 1, 2, 2)
        return self.local_transformation(xy,self.global_transformation(xy,rotational_matrix(self.angle),self.shift))

What is the proper way to make sure that angle is being updated?

ptrblck · December 6, 2019, 9:39pm

Recreating a tensor (in rotational_matrix) will detach self.angle from the computation graph, so you should use torch.cat or torch.stack instead.

chrisby · December 29, 2020, 3:35pm

A related question: Say I want to construct a tensor of the following form:

[[σ , 0,  -κ],
 [0,   κ,  0],
 [-κ,   0,   λ⁴σ]])

where κ is a function of two nn.Parameter L and σ and λ is a function of L. When I construct the tensor by using the parameters directly, it seems like the same thing happens, no gradient updates.

It works, however, when I construct single rows of the tensor (e.g. [0, 1, 0] * κ for the second row) individually. Once I created all rows by multiplying with the Parameters and concatenating, I get gradient updates. Is this really the best way to construct such tensors? Seems like this leads to a slightly more complex computation graph that it needs to be?

ptrblck · December 30, 2020, 3:16am

I’m not sure if I understand the issue correctly, but recreating a tensor or parameter will detach it from the computation graph as seen in the original issue.
If you have already created the parameters, you would have to construct the matrix with cat or stack operations.
Could you post a small code snippet to see your workflow in case I misunderstood your question?

chrisby · December 31, 2020, 2:48pm

Sorry for not being precise enough. Consider the following example module:

class Example(nn.Module):

    def __init__(self, ls):
        super(Example, self).__init__()
        self.ls = nn.Parameter(torch.full((1,), ls))

    def forward(self, X):
        # Create matrix with some cells being 
        # dependent on `self.ls` via λ=√3/self.ls
        # F = [ 0     1
        #      -λ²,  -2λ]
        λ = torch.sqrt(torch.tensor([3.])) / self.ls
        # The following line does not work (self.ls won't receive gradients)
        F = torch.tensor([[0., 1.], [-1.*λ**2, -2*λ]])

        # The following works but my question is if it is the best approach
        # since it can become tedious for larger matrices to follow this approach:
        F_0 = torch.tensor([[0., 1.]])
        F_1 = torch.tensor([[-1., -2.]]) * λ
        F_1 = F_1 * (torch.tensor([[1., 0.]]) * λ +
                            torch.tensor([[0., 1.]]))
        F = torch.cat((F_0, F_1), 0)    

        # Use F for computations with X

ptrblck · January 1, 2021, 1:24am

Thanks for the code snippet.
You are recreating a tensor indeed (F in your code), which will detach lambda from the computation graph. Instead you would need to use torch.cat with torch.ones, torch.zeros, etc. to create F.