Typeerror - RandomIoUCrop() requires input sample to contain tensor or PIL images and bounding boxes. Sample can also contain masks

I am getting started on torch vision to train and evaluate object detection models. I am getting into issues and need some help.
I am setting up basic steps to evaluate a pre-trained model on coco 2017 dataset. Here is my colab notebook.

I am using this example from PyTorch to set up ‘data loader’ for coco dataset.

After the data loader steps, I added steps to train / evaluate using ‘engine.py’.

Any help to resolve this issue will be appreciated. Or any suggestion on alternate ways.