Cloze by computer

I want to implement a model that allows the computer to fill in the blanks on its own. Each blank provides four options. Only one is correct. Can someone give me some suggestions? I want to know the specific implementation method. Thank you so much.

If I understand correctly, you are trying to predict the correct choice in a multiple choice question.
there are many things to consider before creating a model.
first of all, do you have a dataset of multiple choice questions and the correct answer?
second of all, is your dataset in a certain domain or over multiple domains? what I mean by this, is that, are your questions all in physics, or they are also in math, chemistry, etc.
because these kind of tasks are very domain-specific.

one way to design this, is to put some detectable word like {BLANK} in your corpus.
next for every sentence in your dataset, you will fill every blank by this word.
then you design an RNN model, that takes the sentence sequence as input and the correct word in corpus as output. (this would be a one hot coding vector)

model your sentences as a sequence of words, then give it to your RNN network as input.
I think the network, given that is trained for enough epochs, learns to associate the {BLANK} with the correct word.

Yes, I have a rich data set. Each data in the data set contains a piece of text with individual words deleted. The content involves various aspects. The data set provides four options for each mined word. only one is correct, I need to train the model so that the final model can can find the correct answer from four alternative options according to the provided paragraph with individual words deleted

I tried using the language model, but the results were not satisfactory

two approaches come to my mind:

  1. for each question, you can create 4 sentence sequences, each blank filled with one of the options.
    now you have 4 sentences. label the sentence with the correct option 1 and others 0.
    now if you have N questions, you will have 4*N annotated sentences.
    train an RNN for this. it should work.

  2. fill in the blanks with the correct options. throw wrong options away.
    now you will have many sentences, all correct. now learn you can train any seq2word network.
    this way, given a new sentence with a blank on it. you can prioritize the 4 options and say the one with the highest probability is the correct one.

Yes, I used the second method you mentioned before, but the result is not satisfactory, I will try your first method, thank you very much for your help

you’ll probably need an strong RNN, to be able to capture the time dependent correlations. with a big enough dataset and lots of epochs, it will probably work.
though it is very important to be as domain-specific as possible.
the sentences coming from a large area of topics, will only make it harder for the network to learn anything.

OK, but the data set is provided in advance. I cannot change it. I can only optimize the model. Thanks again for your help.I will try the first method you said with the BERT model