Pytorch Object Detection Finetuning Tutorial metrics

Hi there, I am following this tutorial TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 2.3.0+cu121 documentation, and the resulting CocoEvaluator class returns something like this -

Downloading: “https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth” to /var/lib/ci-user/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth

0%| | 0.00/170M [00:00<?, ?B/s]
23%|##3 | 39.6M/170M [00:00<00:00, 415MB/s]
47%|####6 | 79.2M/170M [00:00<00:00, 346MB/s]
67%|######6 | 113M/170M [00:00<00:00, 336MB/s]
88%|########7 | 149M/170M [00:00<00:00, 352MB/s]
100%|##########| 170M/170M [00:00<00:00, 338MB/s]
/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/conv.py:456: UserWarning:

Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at …/aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)

Epoch: [0] [ 0/60] eta: 0:00:27 lr: 0.000090 loss: 4.9131 (4.9131) loss_classifier: 0.4438 (0.4438) loss_box_reg: 0.1060 (0.1060) loss_mask: 4.3589 (4.3589) loss_objectness: 0.0021 (0.0021) loss_rpn_box_reg: 0.0023 (0.0023) time: 0.4656 data: 0.0142 max mem: 2421
/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:744: UserWarning:

Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at …/aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)

Epoch: [0] [10/60] eta: 0:00:15 lr: 0.000936 loss: 1.7988 (2.7740) loss_classifier: 0.4161 (0.3552) loss_box_reg: 0.3087 (0.2540) loss_mask: 0.9491 (2.1314) loss_objectness: 0.0227 (0.0266) loss_rpn_box_reg: 0.0056 (0.0069) time: 0.3027 data: 0.0155 max mem: 2918
Epoch: [0] [20/60] eta: 0:00:10 lr: 0.001783 loss: 0.7890 (1.7882) loss_classifier: 0.2149 (0.2672) loss_box_reg: 0.2016 (0.2329) loss_mask: 0.3993 (1.2588) loss_objectness: 0.0162 (0.0214) loss_rpn_box_reg: 0.0076 (0.0079) time: 0.2652 data: 0.0155 max mem: 2920
Epoch: [0] [30/60] eta: 0:00:07 lr: 0.002629 loss: 0.6767 (1.4244) loss_classifier: 0.1435 (0.2252) loss_box_reg: 0.2288 (0.2425) loss_mask: 0.2597 (0.9274) loss_objectness: 0.0123 (0.0195) loss_rpn_box_reg: 0.0101 (0.0098) time: 0.2443 data: 0.0164 max mem: 2921
Epoch: [0] [40/60] eta: 0:00:05 lr: 0.003476 loss: 0.5594 (1.2069) loss_classifier: 0.0942 (0.1909) loss_box_reg: 0.2408 (0.2357) loss_mask: 0.2277 (0.7536) loss_objectness: 0.0071 (0.0169) loss_rpn_box_reg: 0.0118 (0.0097) time: 0.2398 data: 0.0168 max mem: 2921
Epoch: [0] [50/60] eta: 0:00:02 lr: 0.004323 loss: 0.3676 (1.0410) loss_classifier: 0.0578 (0.1629) loss_box_reg: 0.1529 (0.2171) loss_mask: 0.1593 (0.6378) loss_objectness: 0.0030 (0.0140) loss_rpn_box_reg: 0.0073 (0.0092) time: 0.2282 data: 0.0161 max mem: 2921
Epoch: [0] [59/60] eta: 0:00:00 lr: 0.005000 loss: 0.3425 (0.9415) loss_classifier: 0.0407 (0.1449) loss_box_reg: 0.1228 (0.2045) loss_mask: 0.1593 (0.5710) loss_objectness: 0.0017 (0.0123) loss_rpn_box_reg: 0.0063 (0.0088) time: 0.2230 data: 0.0153 max mem: 2921
Epoch: [0] Total time: 0:00:14 (0.2475 s / it)
creating index…
index created!
Test: [ 0/50] eta: 0:00:07 model_time: 0.1294 (0.1294) evaluator_time: 0.0071 (0.0071) time: 0.1499 data: 0.0130 max mem: 2921
Test: [49/50] eta: 0:00:00 model_time: 0.0424 (0.0770) evaluator_time: 0.0048 (0.0074) time: 0.0747 data: 0.0103 max mem: 2921
Test: Total time: 0:00:04 (0.0962 s / it)
Averaged stats: model_time: 0.0424 (0.0770) evaluator_time: 0.0048 (0.0074)
Accumulating evaluation results…
DONE (t=0.01s).
Accumulating evaluation results…
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.666
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.984
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.889
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.622
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.678
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.289
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.717
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.717
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.400
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.708
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.726
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.676
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.973
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.805
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.446
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.539
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.690
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.299
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.731
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.732
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.633
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.692
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.739
Epoch: [1] [ 0/60] eta: 0:00:10 lr: 0.005000 loss: 0.2538 (0.2538) loss_classifier: 0.0204 (0.0204) loss_box_reg: 0.0604 (0.0604) loss_mask: 0.1699 (0.1699) loss_objectness: 0.0001 (0.0001) loss_rpn_box_reg: 0.0031 (0.0031) time: 0.1800 data: 0.0149 max mem: 2921
Epoch: [1] [10/60] eta: 0:00:11 lr: 0.005000 loss: 0.3333 (0.3693) loss_classifier: 0.0447 (0.0525) loss_box_reg: 0.1257 (0.1422) loss_mask: 0.1603 (0.1642) loss_objectness: 0.0016 (0.0022) loss_rpn_box_reg: 0.0084 (0.0082) time: 0.2395 data: 0.0172 max mem: 2921
Epoch: [1] [20/60] eta: 0:00:08 lr: 0.005000 loss: 0.3333 (0.3463) loss_classifier: 0.0365 (0.0448) loss_box_reg: 0.1092 (0.1181) loss_mask: 0.1679 (0.1743) loss_objectness: 0.0016 (0.0020) loss_rpn_box_reg: 0.0069 (0.0071) time: 0.2214 data: 0.0159 max mem: 2921
Epoch: [1] [30/60] eta: 0:00:06 lr: 0.005000 loss: 0.3062 (0.3281) loss_classifier: 0.0355 (0.0450) loss_box_reg: 0.0809 (0.1122) loss_mask: 0.1473 (0.1624) loss_objectness: 0.0010 (0.0017) loss_rpn_box_reg: 0.0048 (0.0068) time: 0.2082 data: 0.0155 max mem: 2921
Epoch: [1] [40/60] eta: 0:00:04 lr: 0.005000 loss: 0.2747 (0.3229) loss_classifier: 0.0461 (0.0440) loss_box_reg: 0.0809 (0.1069) loss_mask: 0.1425 (0.1633) loss_objectness: 0.0009 (0.0017) loss_rpn_box_reg: 0.0051 (0.0070) time: 0.2188 data: 0.0162 max mem: 2921
Epoch: [1] [50/60] eta: 0:00:02 lr: 0.005000 loss: 0.2658 (0.3112) loss_classifier: 0.0307 (0.0420) loss_box_reg: 0.0674 (0.0999) loss_mask: 0.1544 (0.1610) loss_objectness: 0.0009 (0.0018) loss_rpn_box_reg: 0.0043 (0.0065) time: 0.2254 data: 0.0152 max mem: 2921
Epoch: [1] [59/60] eta: 0:00:00 lr: 0.005000 loss: 0.2286 (0.2973) loss_classifier: 0.0351 (0.0407) loss_box_reg: 0.0544 (0.0933) loss_mask: 0.1298 (0.1551) loss_objectness: 0.0009 (0.0020) loss_rpn_box_reg: 0.0032 (0.0062) time: 0.2254 data: 0.0158 max mem: 2921
Epoch: [1] Total time: 0:00:13 (0.2204 s / it)
creating index…
index created!
Test: [ 0/50] eta: 0:00:02 model_time: 0.0411 (0.0411) evaluator_time: 0.0039 (0.0039) time: 0.0584 data: 0.0130 max mem: 2921
Test: [49/50] eta: 0:00:00 model_time: 0.0395 (0.0410) evaluator_time: 0.0029 (0.0041) time: 0.0554 data: 0.0103 max mem: 2921
Test: Total time: 0:00:02 (0.0570 s / it)
Averaged stats: model_time: 0.0395 (0.0410) evaluator_time: 0.0029 (0.0041)
Accumulating evaluation results…
DONE (t=0.01s).
Accumulating evaluation results…
DONE (t=0.01s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.732
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.985
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.937
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.433
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.676
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.746
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.312
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.778
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.778
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.433
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.758
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.789
IoU metric: segm
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.715
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.993
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.869
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.368
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.553
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.733
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.313
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.760
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.760
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.533
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.683
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.774
That’s it!

In the tutorial they conclude that “we obtain a COCO-style mAP > 50, and a mask mAP of 65”.

I was wondering if anyone could explain how they have came to that conclusion?

Thanks!