I have a NLP model that includes two RNNs: one of them, say RNN_Word, works at a word-level and the other one, say RNN_Char, works at a character-level. The model receives a sentence as input and outputs a label for each word in the sentence.
For each word, the final state of RNN_Char is concatenated with the word embedding and then this concatenated tensor is fed as input to RNN_Word.
I wonder how I can use mini-batch in training. I could group sentences with the same length (in number of words) and then, for each batch of sentences, I could group words with the same length (in number of characters), but this procedure seems to be rather inefficient.
I think I cannot use padding to force all words to have the same length, because my loss is defined at a word-level and the state of RNN_Char would be changing while I was feeding stuffing characters to it.
What do you suggest?