I am doing some changes to the standard Keypoint R-CNN implemented in Pytorch. Specifically, I want to change the KeypointRCNNPredictor module by a Graph Convolutional Network (GCN).
My own unfinished schema of Keypoint R-CNN just for illustrative purposes. Some shapes and arrows may be wrong.
The standard keypoint predictor takes a feature map of size 512×14×14 for every of the B proposed bounding boxes (B×512×14×14) and applies transposed convolutions and interpolation to upscale the feature maps. So the output are B×P heatmaps encoding the probability with which ead of the P keypoints are in a certain position of each of the B boxes (B×P×56×56). This logits are then passed to the keypoint_loss function that is implemented within the RoIHeads source code. This loss takes the target keypoints, transforms them into heatmaps that are all zeros and a one at the correct position, and then computes the CrossEntropyLoss between the target heatmaps and the logit heatmaps outputed by the KeypointRCNNPredictor.
In the case of the GCN, it takes also feature maps, but outputs a list of keypoints. For every bounding box B, it outputs the x,y coordinates of the P keypoints within each box (B×P×2).
If I change the standard keypoint predictor by the GCN, I have two options:
- Option 1. (hard to implement but more correct from my point on view) Overwriting the keypoint loss function for it to be a MSELoss, or some other distance loss, to take the loss as the distance between the predicted points by the GCN and the target coordinates. This also requires to implement again the keypoint inference function within RoIHeads, because the forward takes different paths in train and eval mode.
- Option 2. (very easy if doable) Convert the coordinates output of the GCN into a heatmap and let the losses and inference functions work as usual with heatmaps.
The first option is doable but it is truly a nightmare because it seems like many functions within RoIHeads and the whole GeneralizedRCNN assume that the keypoints are outputed as heatmaps by the keypoint predictor.
The second option, converting the coordinates to a heatmap, is as easy as creating a heatmap of the desired W×H with a parabola centered at the predicted keypoint coordinate for every predicted kpt. My question is… how can I create this heatmap such that its creation is correctly added to the computation graph for the backpropagation??
I am not an expert to Autograd, but I assume that it has to be done in some specific way for the backpropagation to work correctly.
Many thanks for your help!!