How to avoid computation of gradients for neg. samples?

Diego999 · November 8, 2018, 8:27am

Hi everyone,

I have a basic LSTM encoder, which encode some texts. With the hidden representations, I’m doing several stuffs in another module. Until there, nothing special.

Later, I need negative samples with their hidden states that will be needed in the another module to compute the HingeLoss (https://en.wikipedia.org/wiki/Hinge_loss). However, I don’t want the encoder updates its weights twice (or should I ?).

In pseudo code, I have something like:

batch_texts = ...
batch_texts_hidden = myEncoder(batch_texts)

batch_neg = ...
with torch.no_grad():
    batch_neg_hidden = myEncoder(batch_neg)

loss = myHinge(batch_texts_hidden, batch_neg_hidden)

In this manner, 1) is the gradient computed for anotherModule and once for myEncoder ? (only for batch_texts). 2) Do you think I should compute the gradient twice in the Encoder ?

Thank you for your answers