Resources Contact Us Home
Generating document templates that are robust to structural variations

Image Number 4 for United States Patent #7668942.

A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

  Recently Added Patents
Non-aqueous solution process for the preparation of cross-linked polymers
Audio conversation apparatus
Method of creating animatable digital clone from multi-view images
Architectural panel with Tarwe and grass
Compact bus bar assembly, switching device and power distribution system
Memristive junction with intrinsic rectifier
Mixture, especially spinning solution
  Randomly Featured Patents
Thin film magnetic head, head gimbal assembly, and hard disk drive
Board golf game
Athletic gloves for use when cycling and method of making
Apparatus for restraining the movement of wheeled carts
Electric rice cooker
Methods and compositons for stimulating neurogenesis and inhibiting neuronal degeneration
Retractable pool shade with support stand
Method for processing poultry
Method for control of process conditions in a continuous alloy production process
Flock and spiked mat - two hearts design