Argmax performance slower than numpy?

Thanks, filed: https://github.com/pytorch/pytorch/issues/8817