Mask RCNN slow evaluation

I’m training a Mask RCNN model in a distributed way over 2 GPUs. I’m using this as a template.

I can get it working with the coco dataset, and am now repurposing it for my own dataset. I can get it to train but evaluation is extremely slow. I’m talking an hour to train and over 2 hours for evaluation.

When looking at the evaluate function in engine.py, I noticed this line:

# FIXME remove this and make paste_masks_in_image run on the GPU

Is it so slow because it’s running on the CPU?

You could profile the code to get more information where the bottleneck of the evaluation run is.
The FIXME message sounds suspicious, so could you link to the line of code?