Is it a bug for torch.unique on gpu?

How can I get the unique value of an array without changing the sequence of elements in the array?
for example, for array [5,5,3,3,4,4]
If I use torch.unique direct on gpu, the result is always asendingly sorted: [3,4,5]
If I use torch.unique on cpu, the result is still not correct: [4,3,5]
Only numpy.unique can get what I wanted: [5,3,4]
So is this a bug of torch.unique?

code to test:

import torch
import numpy as np
print('raw array on gpu: '+str( c ))
d=torch.unique( c )
print('unique by torch.unique on gpu: '+str(d))
print('unique by torch.unique on cpu: '+str(d2))
print('unique by numpy: '+str(d3))

1 Like

The GPU implementation works currently with a sorted array, such that the sorted argument is ignored.
If sorted is set to False, the CPU implementation uses std::unordered_set if the dim argument is skipped (which does not guarantee any order) or a sorted tensor and the “slow-pointer/fast-pointer” implementation, if the dim argument is specified.

What is your current use case? Maybe we could use the inverse indices to fix your use case.

Thanks for your answer.
Yeah. I think the inverse indices will fix my case exactly, but the unique function of GPU implementation currently is not able to do it, also I cannot find a subsitute for it. I hope it can be fixed soon.

The GPU version also provides the inverse indices if you specify return_inverse=True. What would you like to do exactly which is not working currently?

1 Like

I want to get the output as :
[5, 3, 4]
if the input is:

This code should work:

x = torch.tensor([5,5,3,3,4,4])
x_unique = torch.unique(x, sorted=True)
x_idx = ([(x==x_u).nonzero()[0] for x_u in x_unique])).sort()[0]
> tensor([5, 3, 4])

The code works, but its not efficient.
I mean that the function torch.unique should have got the answer I want

Stumbled upon this now as well. If there is different behaviour, maybe include a warning or similar? I am often changing between gpu and cpu, and this error would not have been expected.

Also, for my case (5x5 tensors), it was significantly faster to transform to cpu and do the sort there and back again.

def unique_via_cpu(x, device):
    return torch.unique(x.cpu(), sorted=False).to(device)

Compared to the function posted above by @ptrblck:
118 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2.86 ms ± 31.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Maybe setting the default to sorted=True would prevent your issues.

Anyway, we are currently working on some performance regressions, so that you should see some improvements in the next couple of days. Sorry for the inconvenience.

1 Like