Getting the vector presentation of each document

Mona_Jalal · November 9, 2018, 10:24pm

Getting the vector representation of each document while each document is written in one row of a csv file (example.csv here) using https://github.com/inejc/paragraph-vectors

So I followed all the steps above and after all the steps I get the following:

[jalal@goku data]$ ls -ltra
total 308
-rw-r--r--.  1 jalal cs-grad    863 Nov  9 00:59 example.csv
drwxr-xr-x. 10 jalal cs-grad   4096 Nov  9 00:59 ..
-rw-r--r--.  1 jalal cs-grad 136981 Nov  9 06:53 example_model.dbow_numnoisewords.2_vecdim.100_batchsize.32_lr.0.001000_epoch.8_loss.1.108974.pth.tar
-rw-r--r--.  1 jalal cs-grad 136981 Nov  9 07:02 example_model.dbow_numnoisewords.2_vecdim.100_batchsize.32_lr.0.001000_epoch.9_loss.1.145683.pth.tar
-rw-r--r--.  1 jalal cs-grad  17395 Nov  9 07:02 example_model.dbow_numnoisewords.2_vecdim.100_batchsize.32_lr.0.001000.png
-rw-r--r--.  1 jalal cs-grad      9 Nov  9 07:05 example_model.dbow_numnoisewords.2_vecdim.100_batchsize.32_lr.0.001000.csv
drwxr-xr-x.  2 jalal cs-grad   4096 Nov  9 07:05 .

How can I actually get the vector representation of each document?

In case you are interested, here’s how the example.csv looks like:

[jalal@goku data]$ cat example.csv 
text
"In the week before their departure to Arrakis, when all the final scurrying about had reached a nearly unbearable frenzy, an old crone came to visit the mother of the boy, Paul."
"It was a warm night at Castle Caladan, and the ancient pile of stone that had served the Atreides family as home for twenty-six generations bore that cooled-sweat feeling it acquired before a change in the weather."
"The old woman was let in by the side door down the vaulted passage by Paul's room and she was allowed a moment to peer in at him where he lay in his bed."
"By the half-light of a suspensor lamp, dimmed and hanging near the floor, the awakened boy could see a bulky female shape at his door, standing one step ahead of his mother. The old woman was a witch shadow - hair like matted spiderwebs, hooded 'round darkness of features, eyes like glittering jewels."

Commands I ran:

[jalal@goku paragraphvec]$ python train.py start --data_file_name 'example.csv' --num_epochs 100 --batch_size 32 --num_noise_words 2 --vec_dim 100 --lr 1e-3
Dataset comprised of 4 documents.
Vocabulary size is 109.

Training started.

and

[jalal@goku paragraphvec]$ python export_vectors.py start --data_file_name 'example.csv' --model_file_name /scratch2/NAACL2018/text_experiment/paragraph-vectors/models/example_model.dbow_numnoisewords.2_vecdim.100_batchsize.32_lr.0.001000_epoch.86_loss.0.827747.pth.tar