How to implement a transfer learning like training process?

I’ve been working on a UNet and I’ve been advised to try a transfer learning style approach. My issue is that I can’t visualise the training procedure, I’ve got myself confused by overthinking the instructions.

To quote the instructions I was given.

I was also thinking about something along the lines of a loss function based on higher-level features. A simple approach would be to use a pre-trained net to extract some features from an intermediate layer and then calculate another loss function based on the difference between the features extracted from the ground truth and from the output images.

I replied:
To confirm, first we get a pretrained model and feed it ground truth inputs, then we extract the resulting features from intermediate layers, and finally we calculate the loss between these extracted features and the output.

Reponse:
You feed the output of the network as well to the pre-trained model to extract the same features as with the ground truth and then compare them to calculate the loss.”

My question is, how would you interpret this architecture or training process? I’ve added what I think below but I’d love you hear your unbiased take.

From my understanding, I get my pretrained UNet, then feed it ground truth images, at the same time I feed the same input to a fresh network im training, I then calculate a loss function based of the intermediate outputs of the pre-trained network and the fresh network?

Also if you know of any similar examples of such a set up, please share as that would massively help.