I’m new to machine learning and I’d like to try to make a simple algorithm which can format text better (the entries in a glossary).
I guess I have a few fundamental questions.
-
Is it likely I will need to find a good dataset to train the algorithm to format the text in the way I want it to? Is it possible for me to generate that data myself? How much would I need? Or is it possible for the algorithm to cluster like with like based on an unsupervised learning algorithm?
-
Is there a single most standard neural network to use or how do you decide which machine learning architecture to use, given the application?
Thanks very much.