I am trying to build a hierarchical sequence model for time series classification (refer to the paper: hierarchical attention networks for document classification). But I’m very confused about how to mask the hierarchical sequences.
My data is a two hierarchical time series. Specifically, each sample is composed of multiple sub-sequences and each sub-sequence is a multivariate time series (just like word–> sentence -->document in NLP). So I need to pad and mask it twice. This is critical as a document will often not have the same number of sentences (or all sentences the same number of words). Finally, I get data as follows:
array([[[[0.21799476, 0.26063576],
[0.2170655 , 0.53772384],
[0.18505535, 0.30702454],
[0. , 0. ]
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ]],
[[0.2160176 , 0.23789616],
[0.2675753 , 0.21807681],
[0.26932836, 0.21914595],
[0.26932836, 0.21914595],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ]]],
[[[0.03941338, 0.33808291],
[0.04766269, 0.30310882],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ]],
[[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ]]],
[[[0.63451399, 0.38098294],
[0.40768279, 0.35421815],
[0.22714901, 0.17020352],
[0.71249301, 0.72023917],
[0.43149343, 0.42213124],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ]],
[[0.54531993, 0.62312294],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ],
[0. , 0. ]]]], dtype=float32)
I wonder how to build a hierarchical LSTM model with two hierarchical variable length sequences?
Anyone can show me an example?
Thanks very much!