Hi,
I am an image captioning model that involves an underlying performance as well. Let me represent a pseudo version of my code.
model = image_caption_model()
tokens, logprobs = model(input)
ce_loss = criterion(gt_tokens, preds_tokens)
embedding = model.embeddings.weight (the embedding weights)
token_embeddings = embeddings(tokens)
output2= model2(token_embeddings)
combined_loss = output2+ce_loss
combined_loss.backward()
now, usually, I should not be using this as the the tokens are not differentiable. Hence, extracting the embeddings from the token may cause trouble for the backdrop. but when I am printing required_grad, I see token_embeddings.require_grad is True. As the tokens are not differentiable, I assumed it to be False. Can someone explain me the reason and whether the current strategy is correct?