Regression Net Architecture Problem

Hey, it is me again :confused:

so I am trying to predict two seperate coordinates on an input image, I use CNN layers and a fully connected layer which projects onto 4 output nodes with a sigmoid activation at the end (to get the values between 0 and 1).

My output is tensor of size (batch_size, 4), where 4 --> [x1, y1, x2, y1] are the coordinates of the predicted points.

So I am training and my loss gets stuck already after epoch 1, i increased the learning rate etc. My predicted output always seems to be in close range to [0.5, 0.5, 0.5, 0.5] and I do not get why. The labels are also right.
Is sigmoid the right function for predicting these values?

Btw my loss is MSELoss

Hope you can help me!

Raph :slight_smile:

I would remove the sigmoid to be honest, since your model might have a hard time predicting values close to 0 and 1.
IIRC the mse loss has also some disadvantages using a sigmoid or softmax, since the weight updates can stall once your model begins to learn.

Alright! Thanks. I need the outputs to be between 0 and 1 though, because my labeled coordinates are also normalized :confused: x1, y1 = real_x1 / width_image, real_y1 / height_image

IIRC you are trying to track mice in a lab setting. Are they often directly on the border?
I would still try to remove the sigmoid from the last layer and try to see, if the predictions get any better.

Yep, the thing is I tracked the mice with the same Model, now I am trying to find their snout and tail on a cropped image where they are usually placed somewhere around the center. All I did was to change the output to 4 instead of 2 nodes. Also my data size (due to manually labeling) is only 2000 images compared to 120.000 before.
Do I need to replace the sigmoid with some other kind of activation?

So I just got rid of the Sigmoid and this happened, does not look really promising :frowning: learning rate is 0.01

Epoch 1/1000
train Loss: 0.007419419793224839, Acc: 0.0
val Loss: 0.0018860326872931587, Acc: 0.0

Epoch 2/1000
train Loss: 0.0018890846087030633, Acc: 0.0
val Loss: 0.0018535365247064166, Acc: 0.0

Epoch 3/1000
train Loss: 0.0018744729794364758, Acc: 0.0
val Loss: 0.0018500339653756883, Acc: 0.0

Epoch 4/1000
train Loss: 0.0018520522586725377, Acc: 0.0
val Loss: 0.0018364218788014517, Acc: 0.0

Using Adadelta() optimizer I get weird behaviours like this:

[[ 1.0000e+00,  4.1560e-39],
 [ 4.1560e-39,  1.0000e+00]] # <-- in one batch all outputs look similiar to this

[[ 0.4993,  0.5002],
 [ 0.4964,  0.5007]] # <-- in next batch similar to this

the outputs seem to jump from batch to batch between around [[1, 0], [0, 1]] to around [[0.5, 0.5], [0.5, 0.5]]
This I find very weird…and I cannot explain why, maybe you have an idea. I tried messing around with the batch size which gave no improvements at all.

It looks like the predictions jump from [0, 0] to [0.5, 0.5].
Are you using the same model but with another last layer?
Did your previous model learn the correct regression using a sigmoid at the end?
Did you think about using your pre-trained model and just add an additional “snout/tail” layer?

Yea, the old model worked with sigmoid at the end. Like I said, everything is similiar but instead of projecting onto 2 neurons in the fully connected last layer, i project on 4 neurons in the new model :confused:.
I am using the old model to track the mouse, then I crop the image (where the mouse is around the centre now), I use this cropped image as input for the model I am building atm. Gonna try with the loaded weights, did not think of that thx.

That would be one idea. Another would be to use just one model with two different output layers.
One for the global mouse position, the other one for the snout and tail.
Let me know, how your experiments work out!

Gonna try the two output layer method. Thanks! I won’t be giving up too soon :smiley:

I ended up using ResNet34 for the Task with the same Hyperparams, worked like a charm, implemented your advises. Thank you @ptrblck

1 Like

hi i’m new for the regression task
could you please tell that how to use the resnet34 to do the regression work, with the input is an image and the output is the coordinate
thanks for your apply