Create character and syntactical features with Torchtext

Hi Everyone,

Any recommendation on what would be a good way to create character and syntactical features (like POS tags) with TorchText?
For character features, the same as most state of the art papers, I’m planning to define a CNN + maxpool on top of the character vectors.
Basically I want to know if it makes sense to use TorchText for this purpose as opposed to implement everything myself?


Hello Amir,

so having done both for different problems, I’d say that it depends on your taste.
torchtext worked well for me in Handwriting generation and it is a great way to keep some transparency in what is in your dataset and avoid compiling the batches by yourself.

That said, using torch.nn.utils.rnn’s pack_sequence to write my own collate function (turning a list of dataset entries into batches for each dataset item) was all it took to use sequences (on a model not yet on github) with the standard / Dataloader mechanisms.

Best regards