How to use the transformers Q&A pipeline for a large corpus

navicstein · May 9, 2020, 6:50pm

Hello all happy week ahead, I have a little question about expanding SQUaD model used by huggingface in the transformers DistilBERT model, according to
https://huggingface.co/transformers/usage.html#extractive-question-answering

Am tokenizing the question and the context differently so that the context tensors can be loaded once (this comes handy when deploying to production and servered via a REST API, I don’t have to tokenize the context passage everytime an endpoint is called) However, those models are designed to find answers within rather small text passage, and my context doesn’t directly fit into the models “vocab_size”, I need help with how to go about feeding the net with a very large “context” (about 200 pages in one “.txt” file ( like a bible.txt file))

what I’ve searched👇

But I would appreciate if anyone have a “light weight” example of how this can be achieved