Detectron2 Video inferencing shows category IDs instead of class names

I’m using detectron2 to do object detection on a video after training a custom dataset. The output video does the detection, but it indicates category ids of either 1 or 2. But I want my detection to show the class names of either ‘weed’ or ‘crop’ instead.

I have downloaded the config files and the model weights for my custom training and used them as shown below;

%run detectron2/demo/demo.py --config-file /content/config2.yaml --video-input /content/crop-weed.mp4 --confidence-threshold 0.6 --output output_colab.mp4 \
--opts MODEL.WEIGHTS /content/output/model_final.pth

Expected behavior

I expected detectron2 to do the video inference and show the class names of either ‘weed’ or ‘crop’ around the bounding box of the object detected.

I need help, please.

Check which part of the code adds the text to the predicted boxed and replace the ID with a mapping to the class name.

That’s what I’m trying to find out. Dectectron2 starter notebook has the below code to do object detection on a video;

%run detectron2/demo/demo.py --config-file detectron2/configs/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml --video-input video-clip.mp4 --confidence-threshold 0.6 --output video-output.mkv \
  --opts MODEL.WEIGHTS detectron2://COCO-PanopticSegmentation/panoptic_fpn_R_101_3x/139514519/model_final_cafdb1.pkl

And I have followed this same template as well. So I don’t know why it doesn’t work.

In the above example, they have used a pretrained model. But for me I want to use a custom trained model on my dataset that’s why I have different model weights & yaml file.

I can’t see in the doc how to add texts to the predicted bounding box. Also, many people have had this problem, but detectron2 staff didn’t solve the issues.

That’s why I have asked the PyTorch community for help because Detectron2 is built on PyTorch

Hi @ptrblck, just to let you know.

Detectron2 has a VideoVisualizer class that takes in all the logic of doing object detection in a video. In there that’s where the metadata like labels is included.

There is also a demo.py that has the template for the terminal inputs.

I have tried using these two files to make edits to my video inferencing, but it hasn’t worked. I went into the demo.py and added a new parser-argument called --vid_meta that accepts the train metadata.

def get_parser():
  parser.add_argument(
          "--vid_meta",
          default=['Crop', 'Weed'],
          help='Adding Video metadata'
      )

And further down, I fed that argument into the VideoVisualizer class as metadata

if args.vid_meta:
      VideoVisualizer(metadata=args.vid_meta)

I have also made some changes to labels in the VideoVisualizer so that it can access the labels from the train metadata. I changed the labels in the function;
draw_instance_predictions() in the VideoVisualizer class;

from;
labels = _create_text_labels(classes, scores, self.metadata."thing_classes", None)
to;
labels = _create_text_labels(classes, scores, self.metadata.things_classes)

After making those edits in both VideoVisualizer and demo.py, I re-ran the video detection command as;

%run /content/demo.py --config-file /content/drive/MyDrive/Real-Time-Object-Detection/Detectron2-models/config4.yaml \
--video-input /content/crop-weed.mp4 --confidence-threshold 0.6 --output output_colab6.mp4 --vid_meta train_metadata \
--opts MODEL.WEIGHTS /content/output/model_final.pth

whereby train_metadata is equivalent to;

namespace(name='my_dataset_train',
          json_file='/content/Annotated-Images-Dataset-1/train/_annotations.coco.json',
          image_root='/content/Annotated-Images-Dataset-1/train',
          evaluator_type='coco',
          thing_classes=['weed-crops', 'crop', 'weed'],
          thing_dataset_id_to_contiguous_id={0: 0, 1: 1, 2: 2})

And then when I rerun it and downloaded the video, unfortunately it still doesn’t work :sob:

But I’m very confident the solution is in configuring the VideoVisualizer class. I hope to get your help on this please.

Hi @ptrblck, I’m waiting on you to help me through this process.

What do you think about the VideoVisualizer and the demo.py that detectron2 provides? How can we manipulate them to be able to have label names around the bounding boxes instead of category IDs.