Resources Contact Us Home
Generating document templates that are robust to structural variations

Image Number 5 for United States Patent #7668942.

A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

  Recently Added Patents
Piezoelectric ultrasonic transducer apparatus
Terminal box assembly
Method, device and computer readable medium for determining whether transmission signals are present in received signals
Field-programmable analog array with memristors
Image forming apparatus having a primary transfer unit, a secondary transfer unit, and a direct transfer unit
Plasma panel based radiation detector
Faucet handle
  Randomly Featured Patents
Cigarette making machine including band inspection
Camera module with piezoelectric actuator
Grip device for moving front floor
DRAM arrays, vertical transistor structures, and methods of forming transistor structures and DRAM arrays
Method for controlling risk in a computer security artificial neural network expert system
Rectifier-inverter system of converter substation
Nozzle assembly for ejecting small droplets
Spark plug