The challenge is that instance segmentation is a rather difficult
problem. Instance segmentation has a number of logical steps to
it, and instance-segmentation models (at least as practiced) are
built out of a number of submodules – so such a model might not
meet your definition of “simple.”
The best I can recommend – and it’s definitely not completely from
scratch with all the steps – is this pytorch tutorial:
TorchVision Object Detection Finetuning Tutorial
It uses pytorch’s prebuilt Mask R-CNN model. (There are other
instance-segmentation models but they all share similar complexity.)
Here is the original Mask R-CNN article. It will give you a sense of the
complexity and submodules involved.
You could certainly build an instance-segmentation model totally from
scratch, but it would be a lot of work. If you want to, I would suggest
that you work through the original Mask R-CNN paper and references
therein (or similar paper about some other model), building the
submodules one by one (from scratch) and then put them together.
But, even if you decide to do this from scratch, I would recommend
first working through the not-from-scratch pytorch tutorial to get an
overview of hos things work and how some of the pieces fit together.