Pytorch hooks gradient differs from parameter gradients

Hi! So I’m currently working on trying to learn the best affine transformation for cropping an image. The way I’m doing this is I’m taking an image, then setting a 2x3 matrix’s, lets call it M, parameters as:

[scale * 1, 0, dx
0, scale * 1, dy] , where the scale,dx,dy are the parameters being learned.

for explanation I will refer to these values as:
[a, b, c
d, e, f]

then using affine_grid, grid_sample with the learned matrix to transform the image, and using a custom pixel-to-pixel loss with the ground truth crop image as the loss to learn the matrix M on a single image.

Problems:

First of all, when I print the dx,dy, and scale parameters’ gradients directly they are different from the gradients of the M matrix. This is wrong.

Second of all, when I print the M gradients, “b”, and “d” variables should never have any gradient as those two are not parameters (yet when I print the M’s gradients they do have non-zero gradients), and then the"a" and “e” should have equal gradients since they’re both just scale*gradient.

Below is the code for you to see, as well as what was printed, please note to compare the Matrix gradients, and individual scale+param gradients for when the counters are equal in the print statements.

code:

cat= TF.to_tensor(np.array(Image.open("images/just_dog.png").convert('RGB')))
cat_dog = TF.to_tensor(np.array(Image.open("images/mask_rect_65_5_30_35_dog_patch.png").convert('RGB')))
folder_path = "images/results_mask_dog_rect_65_5_30_35_patch/"
translated_params = torch.unsqueeze(torch.tensor([0.0,0.0]),1)
translated_params.requires_grad_(True)

scale = torch.unsqueeze(torch.tensor([1.0]),1)
scale.requires_grad_(True)
counter = 0
loss_level = 0

def forward2(x,dxdy,the_scale): 

    M= torch.cat((torch.eye(2)*the_scale, dxdy),dim=1)
    M.register_hook(lambda grad : print("counter is: " , counter, grad))

    grid = F.affine_grid(torch.unsqueeze(M,dim=0),[1] + list(x.shape))
    transformed_image = F.grid_sample(x[None,:,:,:], grid, mode='bilinear')[0]
    
    return transformed_image

optimizer = torch.optim.Adam([scale,translated_params], lr=0.007)

for i in range(601): 
    optimizer.zero_grad()
    predicted= forward2(cat_dog,translated_params,scale)
    criterion = LapLoss(loss_level = loss_level)
    
    if i == 0:
        criterion = LapLoss(loss_level = loss_level, save = True)
    
    if i%200==0 and i!=0:
        
        criterion = LapLoss(loss_level = loss_level,save=True)
        loss_level = loss_level + 1
        if loss_level > 2:
            loss_level = 0
        
        
    loss = criterion.forward(torch.unsqueeze(predicted,0),torch.unsqueeze(cat,0))
    
    loss.backward()
    
    counter = counter + 1 
    
    optimizer.step()

    print("counter is: ", counter, "gradients are: " , "translated=" , translated_params.grad.data, "scale=",   scale.grad.data)
        
    

printed:

counter is:  0 Matrix grads are:  tensor([[-0.1344,  0.0153,  0.4624],
        [ 0.1192, -0.4274,  0.3631]])
counter is:  1 Inidivdual gradients are:  translated= tensor([[0.4624],
        [0.3631]]) scale= tensor([[-0.5617]])
counter is:  1 Matrix grads are:  tensor([[-0.5981,  0.0277,  1.2482],
        [ 0.1807, -0.8871,  0.9514]])
counter is:  2 Inidivdual gradients are:  translated= tensor([[1.2482],
        [0.9514]]) scale= tensor([[-1.4852]])
counter is:  2 Matrix grads are:  tensor([[ 0.0132,  0.1046,  0.6005],
        [ 0.3383, -0.6538,  0.7037]])
counter is:  3 Inidivdual gradients are:  translated= tensor([[0.6005],
        [0.7037]]) scale= tensor([[-0.6406]])
counter is:  3 Matrix grads are:  tensor([[ 0.5854,  0.1962, -0.0363],
        [ 0.4518, -0.3168,  0.3824]])
counter is:  4 Inidivdual gradients are:  translated= tensor([[-0.0363],
        [ 0.3824]]) scale= tensor([[0.2686]])
counter is:  4 Matrix grads are:  tensor([[ 0.8189,  0.1707, -0.2715],
        [ 0.4192, -0.1115,  0.1299]])

Thank you so much! Please let me know if you want to actually run it yourselves and then I’ll post the rest of the code…