Hi all. I’m currently doing fine-tuning by retrofitting on contextualized sentence embeddings. Meaning that given two sentences’ vector representations, eg. vec1
and vec2
, I want to make these two vectors get closer. What confused me is that if I can create a sentence encoder (SentEncoder
like BERT) and pass two sentences to such model to get two vector representation respectively. Then calculate the distance loss and do back propagation and optimization. (Option 1)
Or I need to create two models (model1
and model2
) and pass one sentence to each. (Option 2)
Actually, I think the problem is that if I can forward the model multiple times and do backprop once.
It will be great if you can briefly describe how is the computation graph looks like in this situation.
model = BERT(...)
criterion = lambda vec1, vec2: (vec1 - vec2).norm(dim=-1).sum()
optimizer = optim.Adam(model.parameters())
for epoch in range(n_epoch):
for sent1, sent2 in sent_loader:
##### OPTION 1 #####
vec1 = model(sents1)
vec2 = model(sents2)
####################
##### OPTION 2 #####
# mode1 and model2 share same parameters
vec1 = model1(sents1)
vec2 = model2(sents2)
####################
optimizer.zero_grad()
# use distance (norm)
# between vec1 and vec2 as criterion
loss = criterion(vec1, vec2)
loss.backward()
optimizer.step()
Thanks!