In a document search and retrieval system, document images are segmented into layout objects. Each layout object identifies different structural elements in a document image. In addition, the system computes attributes and features for each segmented layout object. Before any document im
A programming interface of document search system enables a user to dynamically specifying features of documents recorded in a corpus of documents. The programming interface provides category and format flexibility for defining different genre of documents. The document search system
A method and apparatus for compressing a corpus of document images into a collective tokenized representation. Initially, documents in the corpus are individually compressed into a document tokenized format. A document image in the document tokenized format is represented using a symbol
A document search system provides a user with a programming interface for dynamically specifying features of documents recorded in a corpus of documents. The programming interface operates at a high-level that is suitable for interactive user specification of layout components and st