Resources Contact Us Home
Generating document templates that are robust to structural variations

Image Number 5 for United States Patent #7668942.

A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

  Recently Added Patents
Hepodxilin analog enantiomers
Method for producing interferon alpha 5
Integrated circuit with electromagnetic intrachip communication and methods for use therewith
Probe for ultrasound diagnostic apparatus
LED package with top and bottom electrodes
Communication network management system, method and program, and management computer
Food-grade flour from dry fractionated corn germ and collet composition and method for producing same
  Randomly Featured Patents
Systems and methods for writing an image to a computer system
Photosensitive resin composition
Multigate semiconductor device with vertical channel current and method of fabrication
Hologram recording medium and method for manufacturing same
Polyethers containing hindered amine side chains as stabilizers
Electrosurgical instrument
Articulating ultrasonic surgical instrument
Automatic focusing camera
Make-up brush
Arrangement of components