I have a tensor of size (2,3), after running it through torch.unique, I get a tensor size of (2,3) for the unique but only get a tensor size of (1,3) for the counts when dim=1. I was expecting that I get (2,3)
Test code:
x = torch.tensor([[2, 2, 1], [0, 1, 2]])
u, c = torch.unique(x, dim=1, return_counts=True)
Is there a problem with the implementation or have I misunderstood the documentation.
The returned counts tensor c should have the shape [3] (not [1, 3]), which is also what I get.
This is the expected shape, since the counts tensor will have the shape output.size(dim), if dim was specified. From the docs:
counts (Tensor): (optional) if return_counts is True, there will be an additional returned tensor (same shape as output or output.size(dim), if dim was specified) representing the number of ocurrences for each unique value or tensor.
Note, that you will get the unique “columns” in your example.
This code snippet might give you a better idea, as it contains duplicated columns:
x = torch.tensor([[2, 2, 1], [0, 0, 2]])
u, c = torch.unique(x, dim=1, return_counts=True)
print(u)
> tensor([[1, 2],
[2, 0]])
print(c)
> tensor([1, 2])
Here you can see, that two unique columns were found, where [[2], [0]] is duplicated and thus has a count of 2.
When I run it, count returns a shape of (2) instead of (2, 2) unlike uni. I understand your explanation above but how can one get the count for each row instead?
To get the count for each row, you would have to use dim=0 in the torch.unique call.
However, neither dim=0 would return a count in the shape [2, 2], as you are then only counting unique rows.
This wouldn’t necessary result in a 2-dimensional tensor, since each row might contain a different number of unique elements and I think you would need to iterate the rows:
x = torch.tensor([[0, 0, 0],
[1, 1, 2]])
for x_ in x.split(1, 0):
print(torch.unique(x_, return_counts=True))
> (tensor([0]), tensor([3]))
(tensor([1, 2]), tensor([2, 1]))