Hi, I have a CNN model, which produces the index of an object in a vocabulary. The index is then embedded (by nn.Embedding) and fed into a language model for further processing. What I need here is end-to-end learning. However, the problem seems to be that the learning error cannot be propagated back to the CNN model through the embedding layer. Since the language model is pre-trained, I cannot make any change to the word embedding (i.e. nn.Embedding) in this case. Anyone has experience on that? Thanks a lot for the help in advance!
The operation that goes from your first network output to the index is non differentiable (or it’s gradient is 0 almost everywhere). So you cannot backprop through that operation.
A possible solution could be to do a soft indexing on your embedding layer based on the continuous output of your first network. But that is not supported by Embedding so you will need to do this by hand with a matrix vector product using the weights of the Embedding layer.