How to build a hierarchical LSTM model with two hierarchical variable length sequences?

I am trying to build a hierarchical sequence model for time series classification (refer to the paper: hierarchical attention networks for document classification). But I’m very confused about how to mask the hierarchical sequences.

My data is a two hierarchical time series. Specifically, each sample is composed of multiple sub-sequences and each sub-sequence is a multivariate time series (just like word–> sentence -->document in NLP). So I need to pad and mask it twice. This is critical as a document will often not have the same number of sentences (or all sentences the same number of words). Finally, I get data as follows:

array([[[[0.21799476, 0.26063576],
         [0.2170655 , 0.53772384],
         [0.18505535, 0.30702454],
         [0.        , 0.        ]
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ]],

        [[0.2160176 , 0.23789616],
         [0.2675753 , 0.21807681],
         [0.26932836, 0.21914595],
         [0.26932836, 0.21914595],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ]]],

       [[[0.03941338, 0.33808291],
         [0.04766269, 0.30310882],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ]],

        [[0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ]]],

       [[[0.63451399, 0.38098294],
         [0.40768279, 0.35421815],
         [0.22714901, 0.17020352],
         [0.71249301, 0.72023917],
         [0.43149343, 0.42213124],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ]],

        [[0.54531993, 0.62312294],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ],
         [0.        , 0.        ]]]], dtype=float32)

I wonder how to build a hierarchical LSTM model with two hierarchical variable length sequences?
Anyone can show me an example?
Thanks very much!