If your dataset is indeed too large to fit into memory, you need to split it and train using the different “sub-datasets” one after another. In this case, I would actually prepocess the text data completely so that the file already contained the sequences of indexes to speed up the training. Otherwise, you would have to do the preprocessing potentially in each epoch.
As a side comment, when using tokenize = lambda x:x.split() I hope your input text documents have whitespaces before punctuation marks – and after them, which is not a given in user-generated data.