So the only special case is that when input is a cpu tensor and the dim arg is None, output is unsorted by default. Right?