Is there any non-blocking version of torch.nonzero / aten::nonzero?

No: the output size (needed on the CPU) depends on the input data (on the GPU) and this forces the sync. Depending your application and the sizes involved, it might be better to work with a mask or similar.

Best regards

Thomas

2 Likes