I’m working on dialog data and Reinforcement learning. Basically, I want to have the batch input consisting of multiple rounds of dialog data with variable rounds and variable lengths for sentences), for example, assuming the maximum round is 5, max length of a sentence is 4 and vocab size is 10, a 3D indexed raw input like as follows
[[ 1, 2, 3, 4], [2, 3, 7], [3, 4, 5, 6]],
[[0, 1], [ 2, 3, 6, 7], , [5, 6]],
[[2, 4] , [5, 8, 9]]
My question is how to implement the word embedding and lstm layer in pytorch for this kind of input efficiently. My end goal is to get the final state of each sentence in each dialog, say 3 * 5 * 8 (8 is the num of hidden in LSTM). As I understand, the challenges are not only variable lengths of sentences but also variable number of rounds in dialog.
I think the first step is to “pad” the raw indexed input into a 3 * 5 * 4 batch input. Then,
3 * 5 * 4
-> get word embedding for each dialog in the batch 3 * 5 * 4 * 6 (embedding size is 6)
-> run lstm on each sentence in a dialog 3 * 5 * 8
I know the “pack_padded_sequence” function is useful to deal with variable length of sentences(independent sample) in a batch. But here how can I apply embedding layer(which only takes 2D inputs) and lstm to get the sentence(turn) embedding in each dialog(dim: 5 * 4) with masking the 0? Is there something like TimeDistributed layer in keras useful in this case? Thanks a lot! Let me know if it’s not clear.