Resources Contact Us Home
Generating document templates that are robust to structural variations

Image Number 3 for United States Patent #7668942.

A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

  Recently Added Patents
Tiered cost model for access to a computer network
Magnetic resonance imaging apparatus for use with radiotherapy
Personalized dashboard architecture for displaying data display applications
Methods and apparatus for low power out-of-band communications
Systems and methods for adaptive error thresholds or adaptive modulation schemes based on atmospheric conditions
Information processing apparatus and update information obtainment method
Apparatus and method for masking a clock signal
  Randomly Featured Patents
Coin holder
Lithium niobate etchant
Circumferential ablation device assembly
Underwater tank banger
Unitary animal leash and collar
Sheet conveying apparatus for a printer
Oil seal assembly with unbonded backup ring
Location selective transmutation doping on silicon wafers using high energy deuterons
Cap-shaped slipper
Device for securing a collision guard to a vessel