I’m training a Mask RCNN model in a distributed way over 2 GPUs. I’m using this as a template.
I can get it working with the coco dataset, and am now repurposing it for my own dataset. I can get it to train but evaluation is extremely slow. I’m talking an hour to train and over 2 hours for evaluation.
When looking at the evaluate function in engine.py, I noticed this line:
# FIXME remove this and make paste_masks_in_image run on the GPU
Is it so slow because it’s running on the CPU?