Hi all. I’m currently doing fine-tuning by retrofitting on contextualized sentence embeddings. Meaning that given two sentences’ vector representations, eg. `vec1`

and `vec2`

, I want to make these two vectors get closer. What confused me is that if I can create a sentence encoder (`SentEncoder`

like BERT) and pass two sentences to such model to get two vector representation respectively. Then calculate the distance loss and do back propagation and optimization. (Option 1)

Or I need to create two models (`model1`

and `model2`

) and pass one sentence to each. (Option 2)

Actually, I think the problem is that if I can forward the model multiple times and do backprop once.

It will be great if you can briefly describe how is the computation graph looks like in this situation.

```
model = BERT(...)
criterion = lambda vec1, vec2: (vec1 - vec2).norm(dim=-1).sum()
optimizer = optim.Adam(model.parameters())
for epoch in range(n_epoch):
for sent1, sent2 in sent_loader:
##### OPTION 1 #####
vec1 = model(sents1)
vec2 = model(sents2)
####################
##### OPTION 2 #####
# mode1 and model2 share same parameters
vec1 = model1(sents1)
vec2 = model2(sents2)
####################
optimizer.zero_grad()
# use distance (norm)
# between vec1 and vec2 as criterion
loss = criterion(vec1, vec2)
loss.backward()
optimizer.step()
```

Thanks!