I have a collection of about 20K strings from a black box function, each string is 88 ascii characters in length, one string per line in a file.
My initial thought was to convert each line in the file to its numerical ascii value, with an 88 neuron wide input layer; for example:
>>> s = 'AQCfRmKcnAruzC0L8ree7ToDczoHj8b-S30pFqhKu2qldji8awymRepScQGB6my3HoSkXkcse6BlllLm62rnPnxw'
>>> [ord(c) for c in s]
[65, 81, 67, 102, 82, 109, 75, 99, 110, 65, 114, 117, 122, 67, 48, 76, 56, 114, 101, 101, 55, 84, 111, 68, 99, 122, 111, 72, 106, 56, 98, 45, 83, 51, 48, 112, 70, 113, 104, 75, 117, 50, 113, 108, 100, 106, 105, 56, 97, 119, 121, 109, 82, 101, 112, 83, 99, 81, 71, 66, 54, 109, 121, 51, 72, 111, 83, 107, 88, 107, 99, 115, 101, 54, 66, 108, 108, 108, 76, 109, 54, 50, 114, 110, 80, 110, 120, 119]
The first 90% of the file used to train the net, and with the remaining 10% of the file used as the validation set. I am not sure how deep the ANN should be, but plan on experimenting with those parameters once I get an initial proof of concept running.
Any ideas? Thanks in advance.