Hi everyone, this is my first post here, so sorry if I’m missing something.
I’ve implemented the word2vec algorithm following the c code posted by Mikolov, that can be found here.
This is my implementation of the CBOW algorithm with negative sampling, where
-
u_embs
are of dim[N, H]
, soN
target vectors of dimH
to be predicted by the context vectors -
v_embs
are of dim[N, C, H]
, so for eachN
target vectors, I getC
context vectors of dimH
-
neg_v
are of dim[N, M, H]
, so for eachN
target vectors, I getM
negative vectors of dimH
, whereM
is the negative sampling size choosen by the user -
pos_u, pos_v, neg_v
contains the word ids of the target, context and negative examples respectively.pos_v
has been padded with 0 to uniform the size of the context vectors
class CBOW(Word2Vec):
def __init__(self, emb_size, emb_dimension, cbow_mean=True):
super(CBOW, self).__init__(emb_size, emb_dimension)
self.cbow_mean = cbow_mean
init_range = 0.5 / self.emb_dimension
init.uniform_(self.v_embs.weight.data, -init_range, init_range)
init.constant_(self.u_embs.weight.data, 0)
self.v_embs.weight.data[0, :] = 0 # Set padding vector to 0
def forward(self, pos_u, pos_v, neg_v):
u_embs = self.u_embs(pos_u) # u_embs are the "target" vectors
v_embs = self.v_embs(pos_v) # v_embs are the "context" vectors
# Mean of context vector without considering padding idx (0)
if self.cbow_mean:
mean_v_embs = torch.true_divide(
v_embs.sum(dim=1),
(pos_v != 0).sum(dim=1, keepdim=True),
)
else:
mean_v_embs = v_embs.sum(dim=1)
score = torch.mul(u_embs, mean_v_embs)
score = torch.sum(score, dim=1)
score = F.logsigmoid(score)
neg_score = torch.bmm(self.v_embs(neg_v), u_embs.unsqueeze(2))
neg_score = F.logsigmoid(-1 * neg_score)
return -1 * (score.sum() + neg_score.sum())
All is working fine, execept for the results that I have when I try to evaluate the learned embeddings (I save the self.v_embs
embedding).
Since I’ve also implemented the Skip-Gram algorithm, and the only thing that changes is the line score = torch.mul(u_embs, mean_v_embs)
, which becomes score = torch.mul(u_embs, v_embs)
(in Skip-Gram u_embs
and v_embs
has the same dimensions) and there’s no mean to be computed, and since with the Skip-Gram algorithm I obtain similar results to the ones of gensim
and Mikolov, I’m wondering if the culprit could be the mean computation.
Gensim CBOW
Serial | Dataset | Num Pairs | Not found | Rho |
---|---|---|---|---|
1 | EN-SimVerb-3500.txt | 3500 | 255 | 0.1324 |
2 | EN-YP-130.txt | 130 | 12 | 0.1754 |
3 | EN-RG-65.txt | 65 | 0 | 0.4973 |
4 | EN-MEN-TR-3k.txt | 3000 | 13 | 0.5335 |
5 | EN-WS-353-REL.txt | 252 | 1 | 0.5849 |
6 | EN-SIMLEX-999.txt | 999 | 7 | 0.2567 |
7 | EN-MTurk-771.txt | 771 | 2 | 0.5081 |
8 | EN-MC-30.txt | 30 | 0 | 0.5343 |
9 | EN-RW-STANFORD.txt | 2034 | 1083 | 0.3422 |
10 | EN-WS-353-ALL.txt | 353 | 2 | 0.6282 |
11 | EN-WS-353-SIM.txt | 203 | 1 | 0.6768 |
12 | EN-MTurk-287.txt | 287 | 3 | 0.6159 |
13 | EN-VERB-143.txt | 144 | 0 | 0.3538 |
Mine CBOW
Serial | Dataset | Num Pairs | Not found | Rho |
---|---|---|---|---|
1 | EN-SimVerb-3500.txt | 3500 | 255 | 0.1031 |
2 | EN-YP-130.txt | 130 | 12 | 0.1235 |
3 | EN-RG-65.txt | 65 | 0 | 0.3562 |
4 | EN-MEN-TR-3k.txt | 3000 | 13 | 0.4226 |
5 | EN-WS-353-REL.txt | 252 | 1 | 0.4534 |
6 | EN-SIMLEX-999.txt | 999 | 7 | 0.2395 |
7 | EN-MTurk-771.txt | 771 | 2 | 0.4255 |
8 | EN-MC-30.txt | 30 | 0 | 0.5637 |
9 | EN-RW-STANFORD.txt | 2034 | 1083 | 0.3147 |
10 | EN-WS-353-ALL.txt | 353 | 2 | 0.5190 |
11 | EN-WS-353-SIM.txt | 203 | 1 | 0.5775 |
12 | EN-MTurk-287.txt | 287 | 3 | 0.5172 |
13 | EN-VERB-143.txt | 144 | 0 | 0.3202 |
I know that the results are very similar, but not for all tests and beware that with Skip-Gram the results differ by 1% to 3%.
Sorry for the long post, and thank you all.
Federico