I’m trying to confirm that I’ve found a bug in multinomial without replacement. It seems to only use a 32 bit random number as a ‘seed’ for its own random number generator for choosing all the samples. This means at most there are only 2^32 permutations it will generate without replacement, even when there are far more possible. I wrote a short app showing that multinomial without replacement always generates at least one DUPLICATE PERMUTATION every few 10s of thousands of iterations. Am I missing something?
Here’s the app:
import torch
letters = [*"abcdefghijklmnopqrstuvwxyz"]
p = torch.tensor([1.0]*len(letters))
found = {}
for count in range(1, 1000000):
t = torch.multinomial(p, num_samples=len(letters), replacement=False).tolist()
result = ''.join(letters[n] for n in t)
if result in found:
print (f"We have generated {count:,} strings and the most recent one was '{result}'")
print (f"This is the same as the {found[result]:,} generated string which was also '{result}'")
break
found[result] = count
The output looks like this:
We have generated 31,869 strings and the most recent one was 'kmfuqyesxvngithplacwbjdroz'
This is the same as the 1,375 generated string which was also 'kmfuqyesxvngithplacwbjdroz'