I am implementing a char-CNN over words in sentences in a hierarchical fashion (each word goes through the char CNN; the output of this is concatenated with a word embedding; this is then put through an RNN over the entire sentence). My CNN (Conv1d) has a kernel size / stride / padding such that the output of it has the same $N$ and $L$, so I am treating the outputs as one per character (the length $L$ is the number of characters).
I want to get the char-level word representation by doing a max-pool over the characters for a particular word, but in batch, the words don’t have the same length, and I don’t want to do a max pool over the padding characters. Kind of how with the RNN module you can have it ignore the padding in the batch by giving it the sequence lengths. Any idea how to do this?
I think I could iterate over all of the token sequences and do a max pool for each individually, with the kernel size being the length of the token. But this seems inefficient. Is there another easy way to do this, or example code somewhere (all the examples I found seemed to ignore this problem and just do a max-pool over valid and invalid characters in the batch)?
edit: I found Adaptive Max Pool, which might be related, but it doesn’t specify what the input/output sizes should be or whether it’s relevant for my use case.