Trying to find an angle (use of trigonometric functions); optimizer not updating

Hi everybody,

I’m currently trying to figure out how to use PyTorch to optimize an angle representing the angular part of an axis/angle rotation of one vector set into another. That is, I have two sets of vectors of, let’s say, shape (100, 3). One is the input, the other is the target. The target is equal to the input, rotated by the given axis/angle rotation. The angle for each sample in the sets includes some unknown gaussian noise. Due to that unknown noise a simple matrix operation to solve the equation system is impossible.

What I have so far is an nn.Module that executes the rotation with the current angle on the input. Code looks as follows:

class AngleModel(torch.nn.Module):
    def __init__(self):
        super(AngleModel, self).__init__()
        self.angle = nn.Parameter(torch.tensor(0.0, requires_grad = True))
        self.qw = torch.cos(self.angle / 2.)
        self.qx = torch.zeros(1)
        self.qy = torch.zeros(1)
        self.qz = torch.sin(self.angle / 2.)

    def forward(self, input):
        m11 = 1. - 2. * torch.pow(self.qy, 2) - 2. * torch.pow(self.qz, 2).requires_grad_()
        m22 = 1. - 2. * torch.pow(self.qx, 2) - 2. * torch.pow(self.qz, 2).requires_grad_()
        m33 = 1. - 2. * torch.pow(self.qx, 2) - 2. * torch.pow(self.qy, 2).requires_grad_()

        m21 = 2. * self.qx * self.qy - 2. * self.qz * self.qw
        m12 = 2. * self.qx * self.qy + 2. * self.qz * self.qw

        m31 = 2. * self.qx * self.qz + 2 * self.qy * self.qw
        m13 = 2. * self.qx * self.qz - 2 * self.qy * self.qw

        m32 = 2. * self.qy * self.qz - 2. * self.qx * self.qw
        m23 = 2. * self.qy * self.qz + 2. * self.qx * self.qw

        matrix = torch.Tensor([
            [m11, m21, m31],
            [m12, m22, m32],
            [m13, m23, m33],
        ])

        output = torch.matmul(input.float(), matrix.float())
        return output

I initialize the model and optimizer like this:

        model = AngleModel()
        crit = torch.nn.MSELoss()
        l_rate = 0.01
        optim = torch.optim.SGD(model.parameters(), lr = l_rate)
        epochs = 100

and execute the training like this:

        for epoch in range(epochs):
            _x = torch.tensor(input, requires_grad = True)
            _y = torch.tensor(target)

            optim.zero_grad()
            outputs = model.forward(_x)
            loss = crit(outputs, _y)
            loss.backward()
            optim.step()

            print("loss %05.3f; %s" % (loss.item(), model.angle.item()))

I suspect the trigonometric and pow functions to be the culprit of why angle isn’t getting updated. Am I right? angle.grad is None after loss.backward().

How would I go on about this? What do I need to do in order to get this working?

Thank you very much!

You need to declare your parameters in the __init__, but put all computation (except for constants) in the forward pass.
You shouldn’t have to do requires_grad_ in the forward.
You should not use tensor or Tensor within your calculation. This will break the graph.
It isn’t entirely clear to me whether you want qx and qy to be learnable in some way, too (otherwise you could just write them as 0 rather than using torch.zeros. You could leave those in the __init__ if they’re constant.

class AngleModel(torch.nn.Module):
    def __init__(self):
        super(AngleModel, self).__init__()
        self.angle = nn.Parameter(torch.tensor(0.0)) # parameter will have requires_grad by default, so no need to pass in a tensor requiring grad.

    def forward(self, input):
        # make qx...qz locals
        qx = torch.zeros(1)
        qy = torch.zeros(1)
        # qw and qz are dependent on the angle, so we need to recompute them here
        qw = torch.cos(self.angle / 2.)
        qz = torch.sin(self.angle / 2.)
        # there isn't anything wrong with using torch.pow, but personally, I like **
        m11 = 1. - 2. * qy**2 - 2. * qz**2
        m22 = 1. - 2. * qx**2 - 2. * qz**2
        m33 = 1. - 2. * qx**2 - 2. * qy**2

        m21 = 2. * self.qx * self.qy - 2. * self.qz * self.qw
        m12 = 2. * self.qx * self.qy + 2. * self.qz * self.qw

        m31 = 2. * self.qx * self.qz + 2 * self.qy * self.qw
        m13 = 2. * self.qx * self.qz - 2 * self.qy * self.qw

        m32 = 2. * self.qy * self.qz - 2. * self.qx * self.qw
        m23 = 2. * self.qy * self.qz + 2. * self.qx * self.qw

        # noone uses Tensor anymore, and tensor isn't right here either, so you have to cat your way to the matrix or do something differently elsewhere
        matrix = torch.stack([
            torch.cat([m11, m21, m31], dim=0),
            torch.cat([[m12, m22, m32], dim=0),
            torch.cat([[m13, m23, m33], dim=0),
        ], dim=0)

        output = torch.matmul(input, matrix)
        return output

or something similar (I didn’t run your code because your example isn’t completely self-contained nor did I check the maths) should work.

Best regards

Thomas

Works like a charm! Thank you very much! That’s what I figured that it had gotten something to do with a broken graph…

Since I know the axis of the rotation and it is one of the three main axes of the coordinate system, the structure of the corresponding quaternion is already given. That’s why I don’t need to learn qx and qy, hence they can both be 0.

For the sake of completeness, the working final code of the nn.Module looks like this now:

class AngleModel(nn.Module):
    def __init__(self):
        super(AngleModel, self).__init__()
        self.angle = nn.Parameter(torch.tensor(0.0))

    def forward(self, input):
        qw = torch.cos(self.angle / 2.)
        qx = 0.0
        qy = 0.0
        qz = torch.sin(self.angle / 2.)

        matrix = torch.zeros(3, 3)

        matrix[0, 0] = 1. - 2. * qy ** 2 - 2. * qz ** 2
        matrix[1, 1] = 1. - 2. * qx ** 2 - 2. * qz ** 2
        matrix[2, 2] = 1. - 2. * qx ** 2 - 2. * qy ** 2

        matrix[0, 1] = 2. * qx * qy - 2. * qz * qw
        matrix[1, 0] = 2. * qx * qy + 2. * qz * qw

        matrix[0, 2] = 2. * qx * qz + 2 * qy * qw
        matrix[2, 0] = 2. * qx * qz - 2 * qy * qw

        matrix[1, 2] = 2. * qy * qz - 2. * qx * qw
        matrix[2, 1] = 2. * qy * qz + 2. * qx * qw

        output = torch.matmul(input, matrix)
        return output
1 Like

@Ingenieur

Can you provide sample inputs where there is a learnable function that maps input to target?

I created an example notebook for running this code: