How to make torch.nonzero faster

found right timing method here: Measuring GPU tensor operation speed - #4 by apaszke