# Behavior of torch.unique

I expected either `torch.unique` or `torch.unique_consecutive` to give a tensor of unique elements instead, I get a random shuffle of the elements.

Here is what I did , please let me know if this is the correct way of invoking things.

``````a = torch.tensor([[3,1,3,3,4], [2,1,4,3,1]], dtype=torch.long)
output = torch.unique(a, sorted=True, dim=1)
output = torch.unique_consecutive(output, dim=1)
a.shape, output.shape, output
(torch.Size([2, 5]),
torch.Size([2, 5]),
tensor([[1, 3, 3, 3, 4],
[1, 2, 3, 4, 1]]))
``````

Ok, fine, chaining maybe bad (but it is not, really, `unique_consecutive` expects duplicate elements to be consecutive which I believe is achieved by `sorted=True`. Well, probably sorting is done on the same dim. So this is a mystery but ok.

With the same `a = torch.tensor([[3,1,3,3,4], [2,1,4,3,1]], dtype=torch.long)`

`output = torch.unique(a, dim=1)` yields

``````tensor([[1, 3, 3, 3, 4],
[1, 2, 3, 4, 1]])
``````

and `output = torch.unique(a, dim=0)` yields

``````tensor([[1, 3, 3, 3, 4],
[1, 2, 3, 4, 1]])
``````

Clearly I am misunderstanding the usage of `torch.unique`

I expected to achieve something like `[list(set(x.tolist())) for x in a]` which clearly cannot be a tensor because not all dimensions would have same number of unique elements. So if the return type is a tensor, somewhere something is casting/padding additional elements to satisfy tensor type return, which of course beats the purpose as the elements are now repeated.

In the simple example for `a`, such a unique operation yields -

``````z = [list(set(x.tolist())) for x in a]
z
[[1, 3, 4], [1, 2, 3, 4]]

``````

For now I have this work around (I need the unique indices in the order of appearance, I could use OrderedDict but that takes more space)

``````def unique(tensor):
'''
Returns a torch tensor with unique values from given tensor with order preserved.
'''
tensor_list = tensor.tolist()
seen = set()
unique_values = []
for val in tensor_list:
if val not in seen:
unique_values.append(val)
return tensor.new(unique_values)
``````

The problem with this function is that it expects the tensor to be 1-D so for a multi-dimensional tensor, I have to apply this in loop all of which are CPU operations and inherently would require loading big tensors as a copy because of `tolist()`, I was hoping to rely on CUDA based implementation but due to `sort` I cannot. Still, it will be nice to at least have torch.unique() which does not return repeated values.

It will be of a great help if someone can update the docs with an example that would hit this problem. (Perhaps the same `a`?)

If I am in the wrong in understanding the way to use these two functions, please let me know, it will be a tremendous help, in this case, using a more complicated example in the docs would further help others.

Many thanks!

`torch.unique` with a `dim` argument will return the unique tensors in the specified dimension, while `dim=None` will treat the tensor as a flattened tensor.
Have a look at this example:

``````x = torch.tensor([[0, 0, 0],
[0, 0, 1],
[0, 0, 0],
[0 ,0, 1]])

print(torch.unique(x, dim=0))
> tensor([[0, 0, 0],
[0, 0, 1]])
print(torch.unique(x, dim=1))
> tensor([[0, 0],
[0, 1],
[0, 0],
[0, 1]])
print(torch.unique(x))
> tensor([0, 1])
``````

If you want to get the flattened unique values for each row, you would need to use a loop, as `torch.unique` won’t add padding to the result.

Thank you very much! So I was correct in understanding that there is padding.

While this padding is required for building a tensor, it is quite misleading to see repeats. Updated the example in docs with your example above would be really really really helpful.

For my use case, I will stick to my implementation then as there is no way but to use a loop to avoid padded duplicates. This is mostly because I need the ordered unique. However, the alternative would be to get the relative indices and then shuffle.

I don’t see where the padding is.
My code snippet returns the unique rows, then the unique columns, and finally the unique scalars.
Where are these tensors potentially padded?

`My code snippet returns the unique rows, then the unique columns, and finally the unique scalars.`