PyTorch standard Coco dataset (datasets.CocoDetection) not compatible with Faster R-CNN object detection model

I am trying to train and evaluate pre-trained Faster R-CNN model with standard coco dataset. I am getting the following error

TypeError: RandomIoUCrop() requires input sample to contain tensor or PIL images and bounding boxes. Sample can also contain masks.

Here are the high level steps

  1. Downloaded the COCO 2017 dataset
  2. Prepared PyTorch dataset using standard steps from Transforms v2: End-to-end object detection/segmentation example — Torchvision main documentation
  3. Training and evaluating Faster R-CNN model using steps from TorchVision Object Detection Finetuning Tutorial — PyTorch Tutorials 2.2.1+cu121 documentation

Any help will be appreciated. Thanks.

Here is colab notebook to view the steps and error I am getting.