Is it a bug for torch.unique on gpu?

daiwx2002 · December 26, 2018, 12:04pm

How can I get the unique value of an array without changing the sequence of elements in the array?
for example, for array [5,5,3,3,4,4]
If I use torch.unique direct on gpu, the result is always asendingly sorted: [3,4,5]
If I use torch.unique on cpu, the result is still not correct: [4,3,5]
Only numpy.unique can get what I wanted: [5,3,4]
So is this a bug of torch.unique?

code to test:

import torch
import numpy as np
c=torch.tensor([5,5,3,3,4,4]).cuda()
print('raw array on gpu: '+str( c ))
d=torch.unique( c )
print('unique by torch.unique on gpu: '+str(d))
c2=c.cpu()
d2=torch.unique(c2)
print('unique by torch.unique on cpu: '+str(d2))
c3=c2.numpy()
d3,i=np.unique(c3,return_index=True)
d3=d3[np.argsort(i)]
print('unique by numpy: '+str(d3))

ptrblck · December 26, 2018, 2:38pm

The GPU implementation works currently with a sorted array, such that the sorted argument is ignored.
If sorted is set to False, the CPU implementation uses std::unordered_set if the dim argument is skipped (which does not guarantee any order) or a sorted tensor and the “slow-pointer/fast-pointer” implementation, if the dim argument is specified.

What is your current use case? Maybe we could use the inverse indices to fix your use case.

daiwx2002 · December 26, 2018, 3:53pm

Thanks for your answer.
Yeah. I think the inverse indices will fix my case exactly, but the unique function of GPU implementation currently is not able to do it, also I cannot find a subsitute for it. I hope it can be fixed soon.

ptrblck · December 26, 2018, 7:52pm

The GPU version also provides the inverse indices if you specify return_inverse=True. What would you like to do exactly which is not working currently?

daiwx2002 · December 27, 2018, 12:35am

I want to get the output as :
[5, 3, 4]
if the input is:
torch.tensor([5,5,3,3,4,4]).cuda()

ptrblck · December 27, 2018, 12:54pm

This code should work:

x = torch.tensor([5,5,3,3,4,4])
x_unique = torch.unique(x, sorted=True)
x_idx = (torch.cat([(x==x_u).nonzero()[0] for x_u in x_unique])).sort()[0]
print(x[x_idx])
> tensor([5, 3, 4])

daiwx2002 · December 28, 2018, 8:23am

Thanks.
The code works, but its not efficient.
I mean that the function torch.unique should have got the answer I want

enemis · February 13, 2019, 8:33am

Stumbled upon this now as well. If there is different behaviour, maybe include a warning or similar? I am often changing between gpu and cpu, and this error would not have been expected.

Also, for my case (5x5 tensors), it was significantly faster to transform to cpu and do the sort there and back again.

def unique_via_cpu(x, device):
    return torch.unique(x.cpu(), sorted=False).to(device)

Compared to the function posted above by @ptrblck:
118 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
vs.
2.86 ms ± 31.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

ptrblck · February 13, 2019, 12:06pm

Maybe setting the default to sorted=True would prevent your issues.

Anyway, we are currently working on some performance regressions, so that you should see some improvements in the next couple of days. Sorry for the inconvenience.