Torch.unique not returning expecting values

I have a tensor of size (2,3), after running it through torch.unique, I get a tensor size of (2,3) for the unique but only get a tensor size of (1,3) for the counts when dim=1. I was expecting that I get (2,3)

Test code:

x = torch.tensor([[2, 2, 1], [0, 1, 2]])
u, c = torch.unique(x, dim=1, return_counts=True)

Is there a problem with the implementation or have I misunderstood the documentation.

The returned counts tensor c should have the shape [3] (not [1, 3]), which is also what I get.
This is the expected shape, since the counts tensor will have the shape output.size(dim), if dim was specified. From the docs:

  • counts (Tensor): (optional) if return_counts is True, there will be an additional returned tensor (same shape as output or output.size(dim), if dim was specified) representing the number of ocurrences for each unique value or tensor.

Note, that you will get the unique “columns” in your example.
This code snippet might give you a better idea, as it contains duplicated columns:

x = torch.tensor([[2, 2, 1], [0, 0, 2]])
u, c = torch.unique(x, dim=1, return_counts=True)
> tensor([[1, 2],
          [2, 0]])
> tensor([1, 2])

Here you can see, that two unique columns were found, where [[2], [0]] is duplicated and thus has a count of 2.

For the following code:

votes = tensor([[2, 2, 1],
                [1, 1, 2]])

uni, count = th.unique(votes, dim=1, return_counts=True)

### output
uni = tensor([[1, 2],
              [2, 1]])
count = tensor([1, 2])

When I run it, count returns a shape of (2) instead of (2, 2) unlike uni. I understand your explanation above but how can one get the count for each row instead?

To get the count for each row, you would have to use dim=0 in the torch.unique call.
However, neither dim=0 would return a count in the shape [2, 2], as you are then only counting unique rows.

Makes sense

How can one get the count for each unique element in each row of the votes tensor? Resulting in a 2d array e.i. running unique on each row with count

This wouldn’t necessary result in a 2-dimensional tensor, since each row might contain a different number of unique elements and I think you would need to iterate the rows:

x = torch.tensor([[0, 0, 0],
                  [1, 1, 2]])

for x_ in x.split(1, 0):
    print(torch.unique(x_, return_counts=True))

> (tensor([0]), tensor([3]))
  (tensor([1, 2]), tensor([2, 1]))

Thanks. I think I will do that.

I was trying to avoid doing it since it is running in the gpu but I guess it is fine.

My goal is to implement knn and this section of the code is meant to count the votes of the labels