Nonzero is time consuming


Hi, recently, I’m working on object detection. During the post-processing of the object detection, I need to select the valid bounding box which value is higher than threshold. The code is shown in the following.

c_mask = conf_scores[cl].gt(self.conf_thresh).nonzero().view(-1)

However, nonzero() part is really time consuming in this line. It nearly takes around 17ms to processing 11620 * 20 class result.

Does anyone has any ideas about that?

How did you time the code? I’m asking so that I can try to reproduce this.

Hi, @richard, thank your for you responds.

You can try this code

In this repo, the time branch has the relative test in line 107.

You can add a timer there to check the result.

Thank you so much for your help.