Simple (generalizable) Animal Pose Estimation Baseline

I am looking for a animal pose estimation code that predicts the pose of an animal given 100-500 annotated frames from scratch say for 10-12 keypoints on each frame using deep learning on a frame-by-frame basis. Is there any such code? I am not looking for something like DeepLabCut or DeepPoseKit or LEAP/ tools. Looking for a simple baseline written in PyTorch that could be easily modified to add my own loss functions or regularizers on top of that.