Parameters added to pre-trained model have none grads

Hi everybody,

I’ve trained a model on 3D data and now I’d like to add learnable rotation matrix and apply it just after taking new batch (with frozen model parameters).

This is my Rotation_Matrix class:

class Rotation_Matrix(nn.Module):
    def __init__(self):
        super(Rotation_Matrix, self).__init__()
        self.a = nn.Parameter(torch.tensor(1.)) # init value of Rx cosine
        self.b = nn.Parameter(torch.tensor(1.)) # init value of Ry cosine
        self.c = nn.Parameter(torch.tensor(1.)) # init value of Rz cosine
        
    def get_rotation_matrix(self):
        Rx = torch.tensor([[1., 0., 0.],
                          [0., self.a, -torch.sqrt(1 - torch.pow(self.a,2))],
                          [0., torch.sqrt(1 - torch.pow(self.a,2)), self.a]])

        Ry = torch.tensor([[self.b, 0., torch.sqrt(1 - torch.pow(self.b,2))],
                          [0., 1., 0.],
                          [-torch.sqrt(1 - torch.pow(self.b,2)), 0, self.b]])

        Rz = torch.tensor([[self.c, -torch.sqrt(1 - torch.pow(self.c,2)), 0.],
                          [torch.sqrt(1 - torch.pow(self.c,2)), self.c, 0.],
                          [0., 0., 1.]])

        return torch.mm(Rx, torch.mm(Ry, Rz))
    
    def forward(self, x):
        self.matrix = self.get_rotation_matrix()
        return torch.mm(x, self.matrix)

Then initialisation and training of rotation matrix. I haven’t taken care of forcing rotation’s parameters to stay in [-1,1] interval just yet.

# load pre-trained model
model = model.load_state_dict(...)
model.eval()

rotation_matrix = Rotation_Matrix()
optimizer = torch.optim.Adam(rotation_matrix.parameters(), lr=l_rate)

rotation_matrix.train()

for i in range(n_epochs):
    for j, x in enumerate(dataloader):
        x = rotation_matrix(x)
        x = model(x)
        
        loss = loss_fun(x)
        
        optimizer.zero_grad()
        rotation_matrix.zero_grad()

        loss.backward(retain_graph=True)
        

Printing grad of any parameter from rotation_matrix gives none value.
I read some related topics but I didn’t find solution. I tried using ‘retain_grad()’ on parameters before calling ‘loss.backward()’ and I played with ‘autograd.grad()’ function but with no results. What am I missing here? In addition, do I use ‘with torch.no_grad():’ on the ‘x = model(x)’ or it wont allow to backpropagate rotation’s parameters?

Thank you for all answers,
MS.

Hi, So you are missing one important part which is calling optimizer.step() after loss.backward(). The .backward function merely computes the gradients of each parameter with respect to the loss, it is the .step() function that actually applies the updates to the parameters. Also doing optimizer.zero_grad() will set the gradients to 0, there is no need to call rotation_matrix.zero_grad() after it.

Hi Diego,

Of course, I forgot this line. However, still any gradients aren’t computed. As I mentioned in the first post, the problem is with None values of parameters’ grads after calling .backward function, not actually with applying the updates with correctly computed gradients (which is done by optimizer.step()).

self.matrix will be a plain tensor, not an nn.Parameter, if you wrap just single nn.Parameters in another tensor.
You can check if by calling print(rotation_matrix.matrix).

If you create the whole matrix as an nn.Parameter, the code should work:

class Rotation_Matrix(nn.Module):
    def __init__(self):
        super(Rotation_Matrix, self).__init__()
        self.a = torch.tensor(1.) # init value of Rx cosine
        self.b = torch.tensor(1.) # init value of Ry cosine
        self.c = torch.tensor(1.) # init value of Rz cosine
        self.matrix = self.get_rotation_matrix()
        
    def get_rotation_matrix(self):
        Rx = torch.tensor([[1., 0., 0.],
                          [0., self.a, -torch.sqrt(1 - torch.pow(self.a,2))],
                          [0., torch.sqrt(1 - torch.pow(self.a,2)), self.a]])

        Ry = torch.tensor([[self.b, 0., torch.sqrt(1 - torch.pow(self.b,2))],
                          [0., 1., 0.],
                          [-torch.sqrt(1 - torch.pow(self.b,2)), 0, self.b]])

        Rz = torch.tensor([[self.c, -torch.sqrt(1 - torch.pow(self.c,2)), 0.],
                          [torch.sqrt(1 - torch.pow(self.c,2)), self.c, 0.],
                          [0., 0., 1.]])

        return nn.Parameter(torch.mm(Rx, torch.mm(Ry, Rz)))
    
    def forward(self, x):
        return torch.mm(x, self.matrix)
    
model = nn.Linear(3, 1)
model.eval()

rotation_matrix = Rotation_Matrix()
optimizer = torch.optim.Adam(rotation_matrix.parameters(), lr=1.0)

rotation_matrix.train()

x = torch.randn(1, 3)
target = torch.randn(1, 1)
criterion = nn.MSELoss()

for i in range(10):
    optimizer.zero_grad()

    output = rotation_matrix(x)
    output = model(output)    
    loss = criterion(output, target)

    loss.backward()
    print(rotation_matrix.matrix.grad)
    optimizer.step()

Thanks for the answer.

After changes you suggested, gradients are computed correctly for every element in the matrix. This cause another problem. After update of weights the matrix is no longer a rotation matrix (orthogonal and with determinant equal to 1). I want to remain its structure so that after each iteration it would be still a rotation matrix, just like in constructor. So I’d like to update only parameters a, b, c.

In addition, next question came up. Where do I restrict a, b, c to stay in the [-1,1] segment to force these parameters to act like cosine functions? I thought of something like:

rotation_matrix.a = torch.max(torch.tensor(-1.), torch.min(a, torch.tensor(1.)))

but have no idea what place in the code is good for it.