Passing Argument into generate_sp_model()

I’m running generate_sp_model() on a file and getting the following error message:


trainer_interface.cc(346) LOG(WARNING) Found too long line (8849 > 4192).
trainer_interface.cc(348) LOG(WARNING) Too long lines are skipped in the training.
trainer_interface.cc(349) LOG(WARNING) The maximum length can be changed with --max_sentence_length=<size> flag.

I’d like to modify the max_sentence_length but when I try to pass it in as an argument, it says:

generate_sp_model(filename, vocab_size, model_type="unigram", max_sentence_length=10000)
TypeError: generate_sp_model() got an unexpected keyword argument 'max_sentence_length'

How does one go about passing this argument in?

The warning is raised by sentencepiece in these lines of code and torchtext’s generate_sp_model does indeed not expect this argument and thus fails.
I’m not familiar enough with torchtext and don’t know how it exactly calls into sentencepiece, but @zhangguanheng66 might know it.