Loss.backward() not updating model gradients

I’m trying to use a pretrained keypoint detector model from torchvision to predict keypoints on a region extracted from an image. I’d then like to visualize a saliency map of the prediction. I have a loss function that is just the Euclidean distance of each predicted set of keypoints with the closest ground truth set. However, when I call loss.backward(), the printing model.layer.[weight/bias].grad yields nothing. I was trying this to make sure gradients were getting back to the input image. I’ve included code.

  • dets_for_kp is a list of Tensors (CxHxW) that have requires_grad=True.
  • kp is the name of the model. It is a keypointrcnn_resnet50_fpn from torchvision models and is set to eval().
for p in kp.parameters():
    p.requires_grad = True
    # run detections through keypoint detector
    kps = kp(dets_for_kp)

    # we could come up with a single keypoint estimate per image by either averaging
    # or a linear combination weighted by scores
    best = []
    valids = []
    for j, k in enumerate(kps):
        keypoints = k['keypoints']

        if len(keypoints) == 0:
            continue

        best.append(torch.round(keypoints[0]))

        valids.append(j)

    # convert keypoints back to image coordinates. keypoints are (x,y,visible)
    for i in range(len(valids)):
        j = valids[i]
        bbox, ph, pw = reverse_info[j]

        best[i][:,0] += (bbox[0] - pw[0])
        best[i][:,1] += (bbox[1] - ph[0])

    # loop over keypoints and compare them to the ground truth
    # since we don't know a priori which gt and which annotations are closest, and there aren't
    # too many, just loop over each to get centroids and use closest centroid of gt for each 
    # annotation. loss is L2
    anno_centroids = []
    for a in annos:
        x = np.array(a)[0::3].mean()
        y = np.array(a)[1::3].mean()
        z = np.array(a)[2::3].mean()
        anno_centroids.append(np.array([x,y,z]))

    k_centroids = []
    for b in best:
        centroid = b.mean(0)
        k_centroids.append(centroid)

    c_matches = []
    for jj, c in enumerate(k_centroids):
        closest_dist = 1e10
        closest_centroid = None

        for ii, ac in enumerate(anno_centroids):
            if torch.norm(c - torch.Tensor(ac)) < closest_dist:
                closest_dist = torch.norm(c-torch.Tensor(ac))
                closest_centroid = ii

        c_matches.append((jj, ii))

    loss = torch.Tensor([0])
    for m in c_matches:
        loss += torch.norm(best[m[0]].view(-1) - torch.Tensor(annos[m[1]]))

    loss.backward()

You can check that the gradients are nothing by printing

kp.roi_heads.keypoint_predictor.kps_score_lowres.bias.grad

or

dets_for_kp[0].grad

I’ve tried making sure every parameter is set to require_grad=True, along with the input. I’ve wondered if I am doing the loss wrong. I tried using the MSELoss instead but had no luck. What am I doing wrong?