Paper behind Keypoint R-CNN implementation in torchvision

Hi team,
I can’t seem to find the paper behind the Keypoint R-CNN implementation in torchvision.
I’d like to better understand and study the architecture and I was looking for some support in the literature, please.
Is it, actually, just the Mask R-CNN paper :slight_smile: ?
They mention Keypoints Detection in their work.


The initial PR to create the Keypoint R-CNN was this one, where was split into different files by @fmassa.
Based on this I assume the Mask R-CNN paper was the base of these implementations, but let’s see what Francisco says.

1 Like