Nms gpu cost time so more than caffe

How did you post this twice?

I thought that the PyTorch NMS was ultimately derived from the (caffe2) detectron, so it seems strange that it should be that much slower.
You would have to show your benchmarking code before I would try to find out why it is different.

Best regards

Thomas