Getting the vector representation of each document while each document is written in one row of a csv file (example.csv here) using https://github.com/inejc/paragraph-vectors
So I followed all the steps above and after all the steps I get the following:
[jalal@goku data]$ ls -ltra total 308 -rw-r--r--. 1 jalal cs-grad 863 Nov 9 00:59 example.csv drwxr-xr-x. 10 jalal cs-grad 4096 Nov 9 00:59 .. -rw-r--r--. 1 jalal cs-grad 136981 Nov 9 06:53 example_model.dbow_numnoisewords.2_vecdim.100_batchsize.32_lr.0.001000_epoch.8_loss.1.108974.pth.tar -rw-r--r--. 1 jalal cs-grad 136981 Nov 9 07:02 example_model.dbow_numnoisewords.2_vecdim.100_batchsize.32_lr.0.001000_epoch.9_loss.1.145683.pth.tar -rw-r--r--. 1 jalal cs-grad 17395 Nov 9 07:02 example_model.dbow_numnoisewords.2_vecdim.100_batchsize.32_lr.0.001000.png -rw-r--r--. 1 jalal cs-grad 9 Nov 9 07:05 example_model.dbow_numnoisewords.2_vecdim.100_batchsize.32_lr.0.001000.csv drwxr-xr-x. 2 jalal cs-grad 4096 Nov 9 07:05 .
How can I actually get the vector representation of each document?
In case you are interested, here’s how the example.csv looks like:
[jalal@goku data]$ cat example.csv text "In the week before their departure to Arrakis, when all the final scurrying about had reached a nearly unbearable frenzy, an old crone came to visit the mother of the boy, Paul." "It was a warm night at Castle Caladan, and the ancient pile of stone that had served the Atreides family as home for twenty-six generations bore that cooled-sweat feeling it acquired before a change in the weather." "The old woman was let in by the side door down the vaulted passage by Paul's room and she was allowed a moment to peer in at him where he lay in his bed." "By the half-light of a suspensor lamp, dimmed and hanging near the floor, the awakened boy could see a bulky female shape at his door, standing one step ahead of his mother. The old woman was a witch shadow - hair like matted spiderwebs, hooded 'round darkness of features, eyes like glittering jewels."
Commands I ran:
[jalal@goku paragraphvec]$ python train.py start --data_file_name 'example.csv' --num_epochs 100 --batch_size 32 --num_noise_words 2 --vec_dim 100 --lr 1e-3 Dataset comprised of 4 documents. Vocabulary size is 109. Training started.
[jalal@goku paragraphvec]$ python export_vectors.py start --data_file_name 'example.csv' --model_file_name /scratch2/NAACL2018/text_experiment/paragraph-vectors/models/example_model.dbow_numnoisewords.2_vecdim.100_batchsize.32_lr.0.001000_epoch.86_loss.0.827747.pth.tar