How to pretrain a CNN?

Hey there! Awesome Community here, I love it.
I hope you can help me understand the concept of pretraining.
My problem: I would like to track mouse movement for a neurological lab experiment.

What I have done so far: I developed a simple dataset of mice in lab enviroment (3000 images), now I would like to detect them (a simple detection in a 7x7 grid), therefore I use a simpler version of the YOLO-Obj. detetection algorithm.

My question: I would like to pretrain the model on ImageNet’s mouse images. Since this is a classification task, I am not sure how to implement this correctly. I thought of isolating the convolution layers and train them on classifying mouse images, is this correct? What would do the job?

If you have any other ideas how to approach the problem, let me know.

THANKS in advance!

I assume you have created the ground truth annotations for the 3000 images?
If so, you could try to just use a pre-trained ImageNet model, remove the last classification layer, add your position regression layer to the model, and retrain it.

If you have a virgin YOLO model, you could probably pre-train your model with mouse/non-mouse images and then apply the aforementioned procedure.

Would this work for you?

Does the 7x7 grid (I think this your output) have a binary representation?
I mean, the entries are either 0 or 1?

Yes, I have created the ground truth (1 if the mouse is in the grid, 0 if it is not and there is only one 1 per image).
I am currently trying a VGG16 pretrained on ImageNet and will do as you said. I will let you know if it worked!
Thanks for your help!

EDIT:
Btw, I know I need to preprocess all images according to Preprocess for pretrained VGG_bn model. Do I need the same preprocessing for my 3000 images, or do I have to calculate it manually?

Yes it is binary, and there is only one detection (one 1) per image.
E.g. a ground truth matrix:
0 0 0 0 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0

My Labeling Software works like this:
ROI → green
detection grid → red
Labelling%20Software

In fact, you could solve your problem as a classification problem.
You could hence arrange the 49 coordinate-points to 49 labels:

labels = ('0', '1',..., '48')
label_mapping = {'0' : (0,0), '1': (0,1), ..., '48': (6,6)  }

I urge you to investigate the balance of the data. If they are not, you need to balance them, or consider only the inner 6x6 or 5x5 region (but check if these are balanced too).

What do you mean with balanced data?
What I am currently doing is, that I have a ‘classification’ ConvLayer at the end producing one output for each grid cell. My loss function is squared eukledian distance between these grid cells. Haven’t tried it out yet, but I’m gonna update if the network is trained.

EDIT
What kind of loss function would you use for the classification?

Unbalanced data: Like, you have more image examples with their corresponding labels at certain (x, y) locations than some other (x, y) locations. This will, probably, make your classifier more biased to the examples that appear more than others.
Loss function: CrossEntropy is highly recommended.

yea, this is the case. My first network i trained, was clearly overfitted, because it detected the same grid each passthrough. I have augmented the data with random flips and random 90 degree rotations.