Resources Contact Us Home
Generating document templates that are robust to structural variations

Image Number 2 for United States Patent #7668942.

A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

  Recently Added Patents
Flood protection apparatus and container data center including the same
Process for producing a plasma protein-containing medicament with reduced concentration of citrate and metals
Method and system for detecting target objects
Transcoded images for improved trick play
Use of physical deformation during scanning of an object to generate views of the object
Shape based similarity of continuous wave doppler images
Mobile communication terminal provided with handsfree function and controlling method thereof
  Randomly Featured Patents
Tree handling device
Lawn mower and edger assembly
Magnetic encoder using magnetoresistive element
Rubber mixtures containing reinforcing additives, which additives include sulphur and silicon
Polycyclic pyrrolidine-2,5-dione derivatives as -formyl peptide receptor like-1 (FPRL-1) receptor modulators
Fishing lure
Process for preparing hydro-desulfurization catalyst
Cycloalkylcarboxyamidines and halobenzamidines as anti-amebic agents
Night light
Subscriber line interface circuitry