Resources Contact Us Home
Generating document templates that are robust to structural variations

Image Number 3 for United States Patent #7668942.

A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

  Recently Added Patents
Device having built-in digital data device and light for insertion into a lamp holder
System and method for parallel video processing in multicore devices
Methods for testing OData services
Sulfonated amorphous carbon, process for producing the same and use thereof
Cellulose derivative and hydrogel thereof
Single-pass Barankin Estimation of scatterer height from SAR data
System and method for providing location and access network information support in a network environment
  Randomly Featured Patents
Lottery and auction based tournament entry exchange platform
Process for the preparation of chloropyrimidines
Apparatus for playing games
Moving artificial eye
Safety steering for motor vehicles
Method for fabricating stacked capacitors with increased capacitance in a DRAM cell
Leak prevention structure, method and apparatus
Cross-head die
Process for desulfurizing pipelined coal
Pen and note holder