Boosted Wrapper Induction

  • Freitag D
  • Kushmerick N
  • 44


    Mendeley users who have this article in their library.
  • N/A


    Citations of this article.


Recent work in machine learning for information extraction has focused on two distinct sub-problems: the conventional problem of filling template slots from natural language text, and the problem of wrapper induction, learning simple ex- traction procedures (wrappers) for highly structured text such asWeb pages produced by CGI scripts. For suitably reg- ular domains, existing wrapper induction algorithms can effi- ciently learnwrappers that are simple and highly accurate, but the regularity bias of these algorithms makes them unsuitable for most conventional information extraction tasks. Boost- ing is a technique for improving the performance of a simple machine learning algorithm by repeatedly applying it to the training set with different example weightings. We describe an algorithm that learns simple, low-coverage wrapper-like extraction patterns, which we then apply to conventional in- formation extraction problems using boosting. The result is BWI, a trainable information extraction system with a strong precision bias and F1 performance better than state-of-the-art techniques in many domains.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Dayne Freitag

  • Nicholas Kushmerick

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free