Need help with what some of these terms do in output of Fastrcnn model

StrafeNDestroy · July 7, 2022, 2:43am

Hello I’m training the fasterrcnn50 model and needed some help with some of the terms that are output during the training and evaluation phase. These values are from sending a single batch of size 3 to the model.train() and model.eval() just wanted to know what the outputs of the model looked like
Training
What do the terms,loss_box_reg,loss_classifier,loss_objectness and loss_rpn_box_reg mean.

{'loss_box_reg': tensor(0.1667, device='cuda:0', grad_fn=<DivBackward0>),
 'loss_classifier': tensor(0.1150, device='cuda:0', grad_fn=<NllLossBackward0>),
 'loss_objectness': tensor(0.0133, device='cuda:0',
        grad_fn=<BinaryCrossEntropyWithLogitsBackward0>),
 'loss_rpn_box_reg': tensor(0.0077, device='cuda:0', grad_fn=<DivBackward0>)}

Evaluation
Why are my boxes, labels, and scores so big? my boxes should only contain x1,x2,y1,y2 labels should only be 1 of three options [“truck”,“car”,“jeep”], [0,1,2] respectively. And shouldn’t score be a tensor of size one with some sort of score.

[{'boxes': tensor([[ 180.0518,  495.6577, 1024.0000,  908.9251],
          [ 185.0601,  103.9851, 1024.0000,  518.3134],
          [ 182.5600,  338.2882, 1024.0000,  752.6675],
          [ 471.0854,  211.9680, 1024.0000, 1003.5166],
          [  83.2046,  226.3543,  810.2725, 1024.0000],
          [   0.0000,  582.8847,  895.5603,  989.6712],
          [ 651.6221,  118.1111, 1024.0000, 1024.0000],
          [ 229.5882,  396.0160, 1024.0000,  846.7419],
          [ 233.9064,   82.7686, 1024.0000,  534.3818],
          [ 232.9609,  238.4104, 1024.0000,  691.0627],
          [ 231.0070,  548.6355, 1024.0000, 1007.2180],
          [ 123.2340,  173.6318,  767.1678, 1024.0000],
          [   0.0000,  311.9489,  517.6907, 1024.0000],
          [   0.0000,  473.5332,  904.4132,  928.5795],
          [   0.0000,  242.5558,  473.8981, 1024.0000],
          [   0.0000,   82.2973,  707.3653,  722.0305],
          [ 347.9162,   41.3262, 1008.5706,  907.2560]], device='cuda:0',
         grad_fn=<StackBackward0>),
  'labels': tensor([1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2], device='cuda:0'),
  'scores': tensor([0.2119, 0.2107, 0.2102, 0.1983, 0.1857, 0.1548, 0.1534, 0.1534, 0.1521,
          0.1512, 0.1456, 0.1445, 0.1405, 0.1396, 0.1107, 0.0851, 0.0630],
         device='cuda:0', grad_fn=<IndexBackward0>)},
 {'boxes': tensor([[ 344.3575,   96.7017, 1024.0000,  930.0717],
          [  95.6406,  417.0934, 1024.0000,  830.2409],
          [  96.3330,  260.8725, 1024.0000,  673.8717],
          [   0.0000,  110.8532,  872.5706,  519.8826],
          [  96.9932,  576.4453, 1024.0000,  988.0748],
          [   0.0000,  250.0097,  702.8840, 1024.0000],
          [ 382.6677,   54.5672, 1024.0000,  962.0132],
          [ 148.8957,  396.1225, 1024.0000,  846.1454],
          [ 149.7371,  239.8830, 1024.0000,  689.7913],
          [ 150.1571,  549.3901, 1024.0000, 1007.3665],
          [ 200.0044,  173.0603,  845.9658, 1024.0000],
          [   0.0000,  315.7430,  822.4493,  771.3001],
          [   0.0000,   82.2462,  823.9964,  536.9340],
          [   0.0000,  405.8919,  724.5031, 1024.0000],
          [   0.0000,   93.0665,  466.7580, 1024.0000]], device='cuda:0',
         grad_fn=<StackBackward0>),
  'labels': tensor([1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2], device='cuda:0'),
  'scores': tensor([0.2142, 0.2123, 0.2114, 0.1902, 0.1848, 0.1574, 0.1561, 0.1537, 0.1531,
          0.1462, 0.1454, 0.1438, 0.1429, 0.1197, 0.1187], device='cuda:0',
         grad_fn=<IndexBackward0>)},
 {'boxes': tensor([[ 183.7025,  415.5847, 1024.0000,  830.7776],
          [ 187.0761,  257.8963, 1024.0000,  672.8000],
          [ 427.8109,  137.0997, 1024.0000, 1024.0000],
          [ 169.4555,  228.0593,  885.1265, 1024.0000],
          [  20.9872,  575.1481, 1024.0000,  989.0760],
          [   0.0000,  188.6652,  869.6748,  598.1370],
          [ 459.9109,   84.2036, 1024.0000, 1024.0000],
          [ 121.1884,  408.6773, 1024.0000,  975.6856],
          [ 235.6789,  314.0932, 1024.0000,  768.0829],
          [ 147.8334,   74.9042, 1024.0000,  633.0103],
          [   0.0000,  322.5314,  505.9696, 1024.0000],
          [   0.0000,  185.5542,  643.5510, 1024.0000],
          [   0.0000,   33.5092,  413.7768,  889.9094]], device='cuda:0',
         grad_fn=<StackBackward0>),
  'labels': tensor([1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 2], device='cuda:0'),
  'scores': tensor([0.2143, 0.2107, 0.2017, 0.1900, 0.1874, 0.1861, 0.1593, 0.1576, 0.1543,
          0.1518, 0.1443, 0.1272, 0.0540], device='cuda:0',
         grad_fn=<IndexBackward0>)}]

tslavik · October 12, 2022, 6:09pm

For the terms, this stack exchange defines the meanings.

For the outputs, it looks like you had three images predicted, the reason each ‘boxes’ has so many is that the model detected that many instances in the image. In the first image it detected label 1 =8 times, label 2 = 9 times.