Softmax confidence outputs in SSD networks for all anchor boxes is in higher range

I used qfgaohao / pytorch-ssd (which has mobilenet-1-ssd, mobilenet-2-ssd-lite networks) to train a custom dataset with 2 classes (one as background and another class for the object type as the dataset is only for a single class). I first modified the dataloader to load my custom dataset. I trained upto 30 epochs and got an mAP of about 85% and the results seemed good for baseline. The output from softmax are good as for the background class its giving high probabilities and the object class its giving low probabilities for most boxes. In this particular network, we get output from 6 different confidence heads.
Eg: while debugging I did F.softmax(confidences[3], dim=2) . The output was :

tensor([[[0.9768, 0.0232],
[0.9870, 0.0130],
[0.9799, 0.0201],
[0.9844, 0.0156],
[0.9764, 0.0236],
[0.9813, 0.0187],
[0.9867, 0.0133],
[0.9884, 0.0116],
[0.9872, 0.0128],
[0.9907, 0.0093],
[0.9801, 0.0199],
[0.9821, 0.0179],
[0.9889, 0.0111],
[0.9872, 0.0128],
[0.9912, 0.0088],
[0.9825, 0.0175],
[0.9857, 0.0143],
[0.9703, 0.0297],
[0.9604, 0.0396],
[0.9536, 0.0464],
[0.9770, 0.0230],
[0.9749, 0.0251],
[0.9900, 0.0100],
[0.9703, 0.0297],
[0.9703, 0.0297],
[0.9674, 0.0326],
[0.9892, 0.0108],
[0.9872, 0.0128],
[0.9875, 0.0125],
[0.9742, 0.0258],
[0.9825, 0.0175],
[0.9739, 0.0261],
[0.9940, 0.0060],
[0.9780, 0.0220],
[0.9919, 0.0081],
[0.9521, 0.0479],
[0.9629, 0.0371],
[0.9741, 0.0259],
[0.9720, 0.0280],
[0.9738, 0.0262],
[0.9778, 0.0222],
[0.9709, 0.0291],
[0.9773, 0.0227],
[0.9692, 0.0308],
[0.9769, 0.0231],
[0.9823, 0.0177],
[0.9821, 0.0179],
[0.9762, 0.0238],
[0.9783, 0.0217],
[0.9781, 0.0219],
[0.9793, 0.0207],
[0.9798, 0.0202],
[0.9805, 0.0195],
[0.9638, 0.0362]]])

The output for object class scores sorted ```torch.sort(F.softmax(confidences[0], dim=2) [0][:,1])```` is:

torch.return_types.sort(
values=tensor([6.7742e-04, 7.2989e-04, 7.3594e-04, …, 8.8332e-01, 9.0915e-01,
9.1045e-01
]),
indices=tensor([ 121, 103, 367, …, 1083, 1080, 969]))

In the output of the first layer it detects the object class in most test images and the distance between the scores is very high as well. Here it got the output with .91, .90 where the object is and scores in rest of the bounding boxes being very low (in range of e-03 or e-04) where the object is not there.
This output is reasonable as its able to detect background in most anchor locations. Similar was the case in other outputs of confidence heads. It was getting high score for the a few anchor boxes for the object class and the output made sense.

To train and experiment more, I’ve integrated the repository into my custom training pipeline by modularizing the code in such a way that the class has functions to build_net, train and eval the model.I have not made any change to the files for building the model, preprocessing or losses , except for a few functional changes. I’m able to properly load the models trained in original implementation into my pipeline without problem. But now that I have trained mobilenet2-ssd-lite in my integrated code for 30 epochs, the softmax outputs from my integrated pipeline is giving out values that are high for all the anchor boxes in object class. I again did F.softmax(confidences[3], dim=2), this is the output:

tensor([[[0.6760, 0.3240],
[0.6753, 0.3247],
[0.6753, 0.3247],
[0.6748, 0.3252],
[0.6757, 0.3243],
[0.6758, 0.3242],
[0.6747, 0.3253],
[0.6755, 0.3245],
[0.6742, 0.3258],
[0.6738, 0.3262],
[0.6728, 0.3272],
[0.6743, 0.3257],
[0.6751, 0.3249],
[0.6749, 0.3251],
[0.6743, 0.3257],
[0.6743, 0.3257],
[0.6742, 0.3258],
[0.6739, 0.3261],
[0.6752, 0.3248],
[0.6737, 0.3263],
[0.6746, 0.3254],
[0.6749, 0.3251],
[0.6749, 0.3251],
[0.6750, 0.3250],
[0.6765, 0.3235],
[0.6748, 0.3252],
[0.6748, 0.3252],
[0.6724, 0.3276],
[0.6756, 0.3244],
[0.6727, 0.3273],
[0.6747, 0.3253],
[0.6749, 0.3251],
[0.6747, 0.3253],
[0.6739, 0.3261],
[0.6741, 0.3259],
[0.6739, 0.3261],
[0.6767, 0.3233],
[0.6770, 0.3230],
[0.6768, 0.3232],
[0.6769, 0.3231],
[0.6775, 0.3225],
[0.6766, 0.3234],
[0.6770, 0.3230],
[0.6755, 0.3245],
[0.6751, 0.3249],
[0.6752, 0.3248],
[0.6752, 0.3248],
[0.6753, 0.3247],
[0.6756, 0.3244],
[0.6764, 0.3236],
[0.6754, 0.3246],
[0.6755, 0.3245],
[0.6751, 0.3249],
[0.6762, 0.3238]]])

The output for torch.sort(F.softmax(confidences[0], dim=2) [0][:,1]) is

torch.return_types.sort(
values=tensor([0.1053, 0.1532, 0.1655, …, 0.6778,0.7770, 0.8186]),
indices=tensor([1084, 970, 968, …, 969, 1080, 1083]))

Thus it means its able to learn where the object is with high confidence but not able to properly tell where its not in the layers ahead. In confidence[0], compared to the model trained in original implementation above, the scores of the other anchors in the one trained in my integration are not that distant. In confidence[1] - confidence[5], the confidence outputs among the object class are very close in each layers separately but in higher range. It should ideally be able to output the confidence for the anchors to be in range of 0.0xxx as its doing from original implementation. Eg: in confidence[1] all the values for class score might be 0.29xx, in confidence[3] as given above, all the values are in the range 0.32xx and it is a pattern I was seeing. Sometime it can go in the range of 0.4xxx as well for all the anchor boxes. They only differ in 3rd and 4th decimal place.

I’m getting mAP of 71% on the integrated version as well. The consequence of this is when I run inference on the test images nms ends up outputting a lot of bounding boxes all over the image. If I increase the probability threshold to eg 0.5, all the bounding boxes in close range go away if they are below 0.5 and do end up having the object class detected. But it does miss out cases it was able to detect with the model trained in the original implementation.

The loss output from the original seem to lower in original code. But when I’m training from my integration the losses go down as well but not as much. I used cosine lr schelduler in both the runs. I trained mobilenet-1-ssd as well from my implementation but the results are the same.

My implementation losses:

To me it looks like the the layers for confidence headers from confidence[1] - confidence[5] are not learning and even confidence[0] is not able to learn that well in my pipeline.I’m not able to clearly tell where I’m going wrong. Any suggestions will be welcome to debug such a problem. A pytorch forum thread is the closest I’ve found to the current problem.

I don’t know which changes were made, but you could try to isolate each change and check, if the change is applying the expected change.
Could you explain from a high-level perspective, what your changes were supposed to do?

Also, I assume you are using the softmax for debugging?

  1. I did step through the code in debugger and it looks like its able to follow the similar structure to the original code loading the model and data while training. I’m able to load the models trained in original implementation into my pipeline and run some eval operations on it. Also able to load the pretrained weights provided for mobilenet1 and mobilelet2 were loading for training. But I can try doing it once again by isolating and checking them as you have suggested.

  2. The main vision folder which contains various modules for model, data and losses is almost same in my integration. The only changes that I made were :

  • Changed the data loading part to load images and labels from my custom dataset file. The prepossessing applied on the images and labels loaded from my dataset is the same as the original implementation.

  • Moved some of the tensors to ‘cpu’ device as it was causing problem in dataloader it tried to do multiprocessing in cuda device when num_workers>0. After moving the tesnor to cpu it prevented the error as cpu processes were spawned for dataloading and preprocessing. Below is the diff from this file

@@ -141,7 +143,7 @@
 class MatchPrior(object):
     def __init__(self, center_form_priors, center_variance, size_variance, iou_threshold):
         self.center_form_priors = center_form_priors
-        self.corner_form_priors = box_utils.center_form_to_corner_form(center_form_priors)
+        self.corner_form_priors = box_utils.center_form_to_corner_form(center_form_priors).to('cpu')
         self.center_variance = center_variance
         self.size_variance = size_variance
         self.iou_threshold = iou_threshold
@@ -149,11 +151,13 @@
     def __call__(self, gt_boxes, gt_labels):
         if type(gt_boxes) is np.ndarray:
             gt_boxes = torch.from_numpy(gt_boxes)
+            gt_boxes = gt_boxes.to('cpu')
         if type(gt_labels) is np.ndarray:
             gt_labels = torch.from_numpy(gt_labels)
+            gt_labels = gt_labels.to('cpu')
         boxes, labels = box_utils.assign_priors(gt_boxes, gt_labels,
                                                 self.corner_form_priors, self.iou_threshold)
-        boxes = box_utils.corner_form_to_center_form(boxes)
+        boxes = box_utils.corner_form_to_center_form(boxes).to('cpu')
         locations = box_utils.convert_boxes_to_locations(boxes, self.center_form_priors, self.center_variance, self.size_variance)
         return locations, labels
  • Another change for this was made to this file: /vision/ssd/config/mobilenetv1_ssd_config.py
@@ -20,4 +20,4 @@
 ]
 
 
-priors = generate_ssd_priors(specs, image_size)
\ No newline at end of file
+priors = generate_ssd_priors(specs, image_size).to('cpu')
\ No newline at end of file
  1. Yes, the softmax function was used in debugging. The softmax is applied only when the model is run in test mode to get the probability outputs. I applied softmax on the individual feature maps which were output from confidence headers of various layers, which is when I discovered this discrepancy between the outputs of the model trained in the original implementation and the one trained in my pipeline.

Weight decay was 0.1. It should have been 0.0005. My bad.