Does reshaping/slicing a tensor with np.split break autograd graph?

kekgle · October 21, 2020, 9:52am

Hi! I am working on a personal NLP project that involves word and character embeddings.

I want to merge together pre-computer word embeddings and word-embeddings built from character embeddings, through concatenation. The dimension of the pre-computed embeddings is [batch_size; sentence_length; 200], where batch_size is the number of sentences. The dimension for the char-based word embeddings is [number_of_words; word_length; 60], because when computing char embeddings I have to concatenate all words of all sentences in the batch to make the input 3D.

I also have, for char embeddings, the indexes that I can use to separate the words back into sentences. I reshape the output of the char embedding layer into [batch_size; sentence_length; 60] by doing this:

x_char = self.char_embeddings(x_char, x_char_mask)   # shape: [number_of_words; word_length; 60]
x_char = pad_sequence(np.split(x_char, splits[:-1]), batch_first=True)  # shape: [batch_size; sentence_length; 60]

In the snippet above, variable splits contains the position of the last token of each sentence. I am wondering if using np.split to slice the tensor will tamper with the computational graph built by autograd, blocking backprop to the char embedding LSTM. Can anybody help?

phan_phan · October 21, 2020, 11:33am

Are you sure you need to make the character input 3d ?
nn.Embedding is supposed to work with any input shape.

kekgle · October 21, 2020, 1:06pm

Hi phan_phan! I’m sorry I was not clear enough, to create the char-based word embeddings I first use the nn.Embedding module to encode each character into a vector; then I take all the characters of a word and feed it into a RNN (LSTM to be precise) to obtain the word embedding from the character embeddings. The RNN needs the input to be 3D so I have to “concatenate all the sentences” along the same dimension and then separate them again to merge the char-based word embeddings, with, say, GloVe word embeddings that I get from a separate layer.

albanD · October 21, 2020, 2:08pm

Hi,

As a general rule, any op that is not provided by pytorch will break the graph (should ideally fail to run if the Tensor requires grad).
But here you can simply use torch.split or torch.chunk and all will be differentiable

kekgle · October 21, 2020, 2:40pm

Thanks! torch.split is exactly what I was looking for. Just a follow-up question based on your response. Is slicing via [ ] safe for autograd?

my_tensor[new_indices, :, :]  # can I re-order a 3D tensor like this? or should I use torch.index_select?

albanD · October 21, 2020, 2:47pm

Yes it is.

In general in pytorch, all the differentiable function will return correct gradients or raise an error. So if something is differentiable and runs, you are safe and it computes the right thing