NMS implementation slower in pytorch compared to numpy

See here for continuation: