Simplest beginner glossary editor

I’m new to machine learning and I’d like to try to make a simple algorithm which can format text better (the entries in a glossary).

I guess I have a few fundamental questions.

  1. Is it likely I will need to find a good dataset to train the algorithm to format the text in the way I want it to? Is it possible for me to generate that data myself? How much would I need? Or is it possible for the algorithm to cluster like with like based on an unsupervised learning algorithm?

  2. Is there a single most standard neural network to use or how do you decide which machine learning architecture to use, given the application?

Thanks very much.