Image Number 4 for United States Patent #4597057.
Standard ASCII coded text is divided into alpha, numeric, and punctuation tokens. Each token is converted to a string of four-bit nibbles. One nibble is coded to identify the type of token. Additional nibbles are coded to identify the location, if any, of a corresponding alpha or punctuation token in a global dictionary. If no corresponding alpha token is in the dictionary, an alpha token is divided into prefixed, suffixes, and a stem. The location of any prefixes in a table of prefixes, suffixes in a table of suffixes, and the number, and location of corresponding individual characters in a table, of the remaining stem are then coded and stored as part of the string of four-bit nibbles for the alpha tokens. Numeric tokens are stored as a string of four-bit nibbles in which the first nibble identifies the type of token, the next nibble the length, followed by a nibble for each of the digits.