Resources Contact Us Home
Generating document templates that are robust to structural variations

Image Number 4 for United States Patent #7668942.

A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

  Recently Added Patents
Semiconductor device
Customizing a range of acceptable tape dimensional stability write conditions
System and method for advertising messages on distributed document processing devices
Method and system for cooling of integrated circuits
System and method for multi-threaded MIMO OFDM channel equalizer
Communication device
Method and apparatus for re-routing calls in a packet network during failures
  Randomly Featured Patents
Dedifferentiated, programmable stem cells of monocytic origin, and their production and use
Dynamic fuel injection control pressure set-point limits
Method and apparatus for flowing fluid from a plurality of interconnected wells
Method of reducing urine pH
Ladder improvements
Tire tread
Per-element resampling for a digital-to-analog converter
Voice mail communication with call blocking
Bus communication system by unrestrained connection and a communication control method therefor
Hydraulic control system for refuse collection vehicle