Thoughts on discrete valued data for synthetic data hashing project


I have been studying various architectures for synthetic data generation specific to Locality Sensitive Hashing (LSH). Basically I want to train a GAN or AAE with LSH hashes and their corresponding LSH label, to see if it would be feasible to use PyTorch for LSH function approximation. The first problem seems to be the discrete vs. continuous nature of the LSH sequences with corresponding label. I’ve looked at SeqGAN as a potential for this, with the idea that once trained the Generator network could then be sampled from according to label.

Is using a GAN specifically purposed for discrete data a requirement for this? I am assuming I need a the least to use a GAN which uses the concept of labels with additional style vectors that are a part of the Generator / Discriminator training.

Any help would be greatly appreciated!