(b) identifying each token in the text, where each token is selected from the group of tokens consisting of text, at least one number, at least one space, punctuation, at least one end-of-line indicator, and any combination thereof, where an alpha-numeric portion of a token is a word, and where the leading non-alpha-numeric portion of a token is a pre-word;