Resources Contact Us Home
Generating document templates that are robust to structural variations

Image Number 3 for United States Patent #7668942.

A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

  Recently Added Patents
Wireless control kit for camera
Variety corn line KDC7040
Stevia formulation
Washing machine
Process for brominating unsaturated organic compounds with removal of quaternary ammonium or quaternary phosphonium monochlorides
Victim port-based design for test area overhead reduction in multiport latch-based memories
Systems and methods for sorting particles
  Randomly Featured Patents
Sanding block
Continuous moving-table MRI contrast manipulation and/or update of scanning parameters
Automobile bullet door-knob lock accessory
Operational amplifier
Protective coating for images
Tensioner with expanding spring for radial frictional asymmetric damping
Live vaccine constituting minor risk for humans
Disc driving device
Electrical switch construction and method of making the same
Vacuum cleaner