Torch.unique not returning expecting values

haltaha · August 7, 2020, 1:59pm

I have a tensor of size (2,3), after running it through torch.unique, I get a tensor size of (2,3) for the unique but only get a tensor size of (1,3) for the counts when dim=1. I was expecting that I get (2,3)

Test code:

x = torch.tensor([[2, 2, 1], [0, 1, 2]])
u, c = torch.unique(x, dim=1, return_counts=True)

Is there a problem with the implementation or have I misunderstood the documentation.

ptrblck · August 10, 2020, 7:00am

The returned counts tensor c should have the shape [3] (not [1, 3]), which is also what I get.
This is the expected shape, since the counts tensor will have the shape output.size(dim), if dim was specified. From the docs:

counts (Tensor): (optional) if return_counts is True, there will be an additional returned tensor (same shape as output or output.size(dim), if dim was specified) representing the number of ocurrences for each unique value or tensor.

Note, that you will get the unique “columns” in your example.
This code snippet might give you a better idea, as it contains duplicated columns:

x = torch.tensor([[2, 2, 1], [0, 0, 2]])
u, c = torch.unique(x, dim=1, return_counts=True)
print(u)
> tensor([[1, 2],
          [2, 0]])
print(c)
> tensor([1, 2])

Here you can see, that two unique columns were found, where [[2], [0]] is duplicated and thus has a count of 2.

JosueCom · August 19, 2020, 12:13am

For the following code:

votes = tensor([[2, 2, 1],
                [1, 1, 2]])

uni, count = th.unique(votes, dim=1, return_counts=True)

### output
uni = tensor([[1, 2],
              [2, 1]])
count = tensor([1, 2])

When I run it, count returns a shape of (2) instead of (2, 2) unlike uni. I understand your explanation above but how can one get the count for each row instead?

ptrblck · August 19, 2020, 4:03am

To get the count for each row, you would have to use dim=0 in the torch.unique call.
However, neither dim=0 would return a count in the shape [2, 2], as you are then only counting unique rows.

JosueCom · August 19, 2020, 6:29am

Makes sense

How can one get the count for each unique element in each row of the votes tensor? Resulting in a 2d array e.i. running unique on each row with count

ptrblck · August 19, 2020, 7:46am

This wouldn’t necessary result in a 2-dimensional tensor, since each row might contain a different number of unique elements and I think you would need to iterate the rows:

x = torch.tensor([[0, 0, 0],
                  [1, 1, 2]])

for x_ in x.split(1, 0):
    print(torch.unique(x_, return_counts=True))

> (tensor([0]), tensor([3]))
  (tensor([1, 2]), tensor([2, 1]))

JosueCom · August 19, 2020, 8:00am

Thanks. I think I will do that.

I was trying to avoid doing it since it is running in the gpu but I guess it is fine.

My goal is to implement knn and this section of the code is meant to count the votes of the labels