Torchvision faster R-CNN model input arguments

Reading the pytorch documentation about faster RCNN from this page, It says that the input to the model during training is a list of dictionaries with the keys as ‘boxes’, ‘labels’.

However, when reading the fine tuning tutorial for faster r-cnn, there are additional arguments defined such as the ‘image_id’, ‘area’ and ‘iscrowd’ keys.

The documentation does not mention any of those keys. How do we resolve this discrepancy?

The training only uses boxes and labels and ignores any other keys. As the tutorial says, the image_id, area and iscrowd are used in evaluation (but not during training).

Best regards

Thomas