Resources Contact Us Home
Generating document templates that are robust to structural variations

Image Number 2 for United States Patent #7668942.

A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

  Recently Added Patents
Pyrazole kinase modulators and methods of use
Single-electron detection method and apparatus for solid-state intensity image sensors with a charge splitting device
Semiconductor device manufacturing method and device for same
Apparatus and methods for providing efficient space-time structures for preambles, pilots and data for multi-input, multi-output communications systems
Reduced plating ignitron
Multiple secure elements in mobile electronic device with near field communication capability
Establishing a social network
  Randomly Featured Patents
Flotation device
Movable vehicle roof
Head cushion
Liquid interface configurations for automated patch clamp recording
Scalable non-blocking switching network for programmable logic
Positioning apparatus for tool sharpening
Racquet sports shorts
Mobile telephone set
Recording medium and image forming apparatus for forming image thereon
Synchronism error detecting and correcting system for a circulating memory