RetinaNet-101 Not Training Well, Not Learning

Hello folks,

I am trying to train a RetinaNet-101 model on a single class. I managed to get the dataset into the correct coco format, set the images up correctly.

The model runs, I let it run for over 100 epochs but the results are the same as epoch 0.
Here is a snippit and you’ll see what I mean. Does anyone know what the issue could be, or where I would even begin to diagnose this problem?

Epoch: 128 | Iteration: 0 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22222
Epoch: 128 | Iteration: 1 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22222
Epoch: 128 | Iteration: 2 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22222
Epoch: 128 | Iteration: 3 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22222
Epoch: 128 | Iteration: 4 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22222
Epoch: 128 | Iteration: 5 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22222
Epoch: 128 | Iteration: 6 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22222
Epoch: 128 | Iteration: 7 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22222
Epoch: 128 | Iteration: 8 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22222
Epoch: 128 | Iteration: 9 | Classification loss: 2.30212 | Regression loss: 0.37198 | Running loss: 0.22757
Epoch: 128 | Iteration: 10 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22272
Epoch: 128 | Iteration: 11 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22272
Epoch: 128 | Iteration: 12 | Classification loss: 2.30212 | Regression loss: 0.29100 | Running loss: 0.22790
Epoch: 128 | Iteration: 13 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 14 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 15 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 16 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 17 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 18 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 19 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 20 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 21 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 22 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 23 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 24 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 25 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 26 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 27 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 28 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 29 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22790
Epoch: 128 | Iteration: 30 | Classification loss: 0.00000 | Regression loss: 0.00000 | Running loss: 0.22304

Double post from here.

In a way it’s similar. But the solution was to not use ADAM and change the LR. But I’m trying to recreate a model I read in a paper which used ADAM. Any ideas how to get this working?
There is another implementation of Retinanet here but it’s unsupported now. GitHub - fizyr/keras-retinanet: Keras implementation of RetinaNet object detection.

Would be good if I could use PyTorch. Surely it’s capable of training a RetinaNet model using Adam and a Resnet-101 backbone.