Hi! I am working on a personal NLP project that involves word and character embeddings.
I want to merge together pre-computer word embeddings and word-embeddings built from character embeddings, through concatenation. The dimension of the pre-computed embeddings is [batch_size; sentence_length; 200], where batch_size is the number of sentences. The dimension for the char-based word embeddings is [number_of_words; word_length; 60], because when computing char embeddings I have to concatenate all words of all sentences in the batch to make the input 3D.
I also have, for char embeddings, the indexes that I can use to separate the words back into sentences. I reshape the output of the char embedding layer into [batch_size; sentence_length; 60] by doing this:
x_char = self.char_embeddings(x_char, x_char_mask) # shape: [number_of_words; word_length; 60]
x_char = pad_sequence(np.split(x_char, splits[:-1]), batch_first=True) # shape: [batch_size; sentence_length; 60]
In the snippet above, variable
splits contains the position of the last token of each sentence. I am wondering if using
np.split to slice the tensor will tamper with the computational graph built by autograd, blocking backprop to the char embedding LSTM. Can anybody help?
Are you sure you need to make the character input 3d ?
nn.Embedding is supposed to work with any input shape.
Hi phan_phan! I’m sorry I was not clear enough, to create the char-based word embeddings I first use the
nn.Embedding module to encode each character into a vector; then I take all the characters of a word and feed it into a RNN (LSTM to be precise) to obtain the word embedding from the character embeddings. The RNN needs the input to be 3D so I have to “concatenate all the sentences” along the same dimension and then separate them again to merge the char-based word embeddings, with, say, GloVe word embeddings that I get from a separate layer.
As a general rule, any op that is not provided by pytorch will break the graph (should ideally fail to run if the Tensor requires grad).
But here you can simply use
torch.chunk and all will be differentiable
Thanks! torch.split is exactly what I was looking for. Just a follow-up question based on your response. Is slicing via
[ ] safe for autograd?
my_tensor[new_indices, :, :] # can I re-order a 3D tensor like this? or should I use torch.index_select?
Yes it is.
In general in pytorch, all the differentiable function will return correct gradients or raise an error. So if something is differentiable and runs, you are safe and it computes the right thing