I’m following the word embedding tutorial from the Pytorch tutorials page here. I compared the values in the nn.Embedding
layer before and after training and the values did not change by a lot. I am wondering whether this is due to:
- small scale of the problem (small test document, embedding size)
- comparing the incorrect things
- the training is not enough and the loss is too high
The code is pretty much verbatim from the tutorial. I wrote the vectors to disk (in a similar format as output by word2vec/fasttext). Here is a head
of the vectors before and after training:
==> before.vec <==
97 10
held: -0.47635 -0.00538 0.03114 -0.91018 0.82155 0.34278 0.41364 -0.54206 -0.15564 -0.90375
small -1.08468 -0.75569 -1.08190 -1.49247 -0.97938 -1.23464 2.02468 1.95724 0.92244 1.60459
proud -1.24031 1.05229 -0.21792 1.41590 -0.66731 1.00172 0.62765 1.17728 1.71781 0.74608
to 1.35696 -0.93113 1.62786 -0.61536 0.56246 0.59689 0.19493 0.01508 -0.84847 -1.13680
his -0.64310 -0.43424 0.56751 0.56234 1.47800 2.49974 -2.00864 -1.73962 0.92000 0.89800
say, -0.60591 -2.21703 0.80868 0.51814 0.73763 -0.12399 -0.36284 -0.03719 0.75559 1.37157
a 0.81077 1.01812 0.61868 -1.40281 -0.49369 -0.90551 0.00841 -0.11104 0.87211 0.16391
thou -0.96502 -0.49956 -0.20308 1.31124 -1.23211 2.14177 -0.68637 0.40443 -0.26006 0.37999
field, 0.38589 0.86313 -0.10661 -0.19126 2.88197 -0.80228 0.42361 1.36436 -0.52477 1.39875==> after.vec <==
97 10
held: -0.47579 -0.00538 0.03034 -0.91066 0.82071 0.34304 0.41407 -0.54159 -0.15510 -0.90298
small -1.08346 -0.75622 -1.08146 -1.49210 -0.97906 -1.23362 2.02298 1.95819 0.92355 1.60536
proud -1.24031 1.05193 -0.21596 1.41722 -0.66788 1.00245 0.62989 1.17841 1.71748 0.74604
to 1.35664 -0.93089 1.62765 -0.61374 0.56187 0.59659 0.19331 0.01426 -0.84901 -1.13577
his -0.64278 -0.43336 0.56731 0.56225 1.47831 2.50070 -2.00857 -1.73981 0.91986 0.89775
say, -0.60615 -2.21782 0.80835 0.51809 0.73800 -0.12409 -0.36205 -0.03710 0.75553 1.37224
a 0.81011 1.01831 0.61801 -1.40217 -0.49177 -0.90454 0.00985 -0.11124 0.87107 0.16403
thou -0.96507 -0.49984 -0.20311 1.31315 -1.23123 2.14063 -0.68915 0.40313 -0.26036 0.37985
field, 0.38665 0.86381 -0.10656 -0.19033 2.88151 -0.80251 0.42322 1.36459 -0.52421 1.39849
As you can see they are almost the same (I’ve highlighted some differences). Here are the losses of the 10 epoch run:
[tensor([ 524.3104], device='cuda:0'),
tensor([ 521.7900], device='cuda:0'),
tensor([ 519.2901], device='cuda:0'),
tensor([ 516.8104], device='cuda:0'),
tensor([ 514.3484], device='cuda:0'),
tensor([ 511.9043], device='cuda:0'),
tensor([ 509.4778], device='cuda:0'),
tensor([ 507.0662], device='cuda:0'),
tensor([ 504.6693], device='cuda:0'),
tensor([ 502.2869], device='cuda:0')]
Clearly the loss is a lot, even though its going down. So my question is am I doing something wrong? After training the content of the nn.Embedding
is the word embeddings right? I didn’t include the code because I didn’t do anything different except writing the vectors out to disk.