Hi there, I have a question about how to embed tensors in irregular shapes.
In most nlp models based on RNN, normally there is one input for each time step, and according to this tutorial, if we want to ran the model in mini-batch mode, we could pad the variable length sequences to make them to the same length, then the input would in the shape (max_length, batch_size), and we could embed the inputs with nn.Embedding.
So my question is, if I want to input more than one word for each time step, the mini-batch inputs would in irregular shape, how could we embed the inputs with nn.Embedding ? For example, I have two sequences, a=[[1,9],[3],[4,5,6],[7,8]], and b=[[4],[5,6],[8,3]]. I combine them as a mini-batch input, so after padding, the input would look like this(end_of_sequence token=2, pad_token=0):
[
[[1,9],[4]],
[[3],[5,6]],
[[4,5,6],[8,3]],
[[7,8],[2]],
[[2],[0]]
]
Is there a ‘pytorch’ way to appropriatly embed the input?
Thanks.