How to implement this model using BiLSTM with attention?

I have a dataset,the each sample in dataset is <Question,Document,Answer> ,Answer mayebe in document or not, Now,I want implement model using BiLSTM with attention.

The model have two input layers ,the one layer is question layer and the other layer is documnet layer。

the model output layer should predict whether each word in the Document is the start and end of the answer or not

For example, given a Question as input:
“who discovered neptune the planet?”

Given Document as input :
“With a prediction by Urbain Le Verrier , telescopic observations confirming the existence of a major planet were made on the night of September 23–24, 1846”

The Answer for Question is “Urbain Le Verrier”

the output layer predict word “with” is not start word or end of answer
the output layer predict word “a” is not start word or end of answer
the output layer predict word “prediction” is not start word or end of answer
the output layer predict word “by” is not start word or end of answer
the output layer predict word “Urbain” is start word of answer
the output layer predict word “Le” is not start word or end of answer
the output layer predict word “Verrier” is end word of answer

How to implement this model using BiLSTM with attention?