Hi everyone,
I have a basic LSTM encoder, which encode some texts. With the hidden representations, I’m doing several stuffs in another module. Until there, nothing special.
Later, I need negative samples with their hidden states that will be needed in the another module to compute the HingeLoss (https://en.wikipedia.org/wiki/Hinge_loss). However, I don’t want the encoder updates its weights twice (or should I ?).
In pseudo code, I have something like:
batch_texts = ...
batch_texts_hidden = myEncoder(batch_texts)
batch_neg = ...
with torch.no_grad():
batch_neg_hidden = myEncoder(batch_neg)
loss = myHinge(batch_texts_hidden, batch_neg_hidden)
In this manner, 1) is the gradient computed for anotherModule and once for myEncoder ? (only for batch_texts). 2) Do you think I should compute the gradient twice in the Encoder ?
Thank you for your answers