Question about retrofitting

Hi all. I’m currently doing fine-tuning by retrofitting on contextualized sentence embeddings. Meaning that given two sentences’ vector representations, eg. vec1 and vec2, I want to make these two vectors get closer. What confused me is that if I can create a sentence encoder (SentEncoder like BERT) and pass two sentences to such model to get two vector representation respectively. Then calculate the distance loss and do back propagation and optimization. (Option 1)

Or I need to create two models (model1 and model2) and pass one sentence to each. (Option 2)

Actually, I think the problem is that if I can forward the model multiple times and do backprop once.

It will be great if you can briefly describe how is the computation graph looks like in this situation.

model = BERT(...)
criterion = lambda vec1, vec2: (vec1 - vec2).norm(dim=-1).sum()
optimizer = optim.Adam(model.parameters())

for epoch in range(n_epoch):
    for sent1, sent2 in sent_loader:

        ##### OPTION 1 #####
        vec1 = model(sents1)
        vec2 = model(sents2)
        ####################

        ##### OPTION 2 #####
        # mode1 and model2 share same parameters
        vec1 = model1(sents1)
        vec2 = model2(sents2)
        ####################

        optimizer.zero_grad()
        # use distance (norm)
        # between vec1 and vec2 as criterion
        loss = criterion(vec1, vec2)
        loss.backward()
        optimizer.step()

Thanks!

Is this solution useful?How can i know data flow in the network?