How to implement nested transformers: a character-level transformer for words and a word-level transformer for sentences?

In order to not repeat myself again, here is a link to the full detailed question on DataScience Stackexchange: link