Advice on approach

Hi everyone,
A very fresh newbie here :slight_smile:

I’m a baby researcher haha,

I need advice on how to go about this problem.

I have a dataset of cropped eye images (~200k) with corresponding label landmarks (~200k JSON files).
I want to have an end-to-end deep learning to track the corresponding landmark in real-time.

I was thinking, if I could use already trained Resnets, DenseNETS etc for this task, using transfer learning. The problem is I’m confused about how to go about this. The general pipeline is to have use of face detection, to detect the roi, then a facial landmarks to get corresponding eye landmarks on the face.

Can someone give me a pointer how I can use transfer learning from already trained cnn? As I was thinking if it would ever be possible to use transfer learning (popular cnn architectures) as most of them were trained on imagenet (traditionally known for image classification,).