Defining hash function for multi-dimensional tensor

Hi everyone!
I need to define a hash function (mapping), with uniform distribution on its output.

Signature: Zd -> X, where X = {1,2,3, … n} (n = fixed integer).
Thus, a function that takes a d dimensional tensor as input and returns an integer value corresponding to it.

The condition is that the output of the Hash function should have a uniform density over the range of X. Is there an inbuilt PyTorch function that can help here? Or any other way to implement this?

Thanks in advance.

It seems like pytorch lacks this, but you can do
hash(pickle.dumps(tensor)) % n

2 Likes

Thanks, @googlebot Alex. I actually need to do this task parallelly. Is there a PyTorch utility that can help in running such a function parallelly?

E.g. methods like torch.sum() and torch.mean() allow us to specify a dimension over which we can reduce. I am looking for a similar way to hash the elements of a multi-dimensional input.

I’m not aware of a universal routine for that.

If you’re implying float tensors with known ranges and reasonable entropy, “surrogate” hash values based on standard reductions can be used, e.g.:

hashes=torch.quantize_per_tensor(torch.randn(1000,8), 0.1, 0, torch.qint32).int_repr().sum(-1) % 10
torch.bincount(hashes)

tensor(cpu,(10,)[ 92, 93, 106, 113, 94, 101, 97, 89, 100, 115])

3 Likes