Boosted Wrapper Induction

  • Freitag D
  • Kushmerick N
  • 44

    Readers

    Mendeley users who have this article in their library.
  • N/A

    Citations

    Citations of this article.

Abstract

Recent work in machine learning for information extraction has focused on two distinct sub-problems: the conventional problem of filling template slots from natural language text, and the problem of wrapper induction, learning simple ex- traction procedures (wrappers) for highly structured text such asWeb pages produced by CGI scripts. For suitably reg- ular domains, existing wrapper induction algorithms can effi- ciently learnwrappers that are simple and highly accurate, but the regularity bias of these algorithms makes them unsuitable for most conventional information extraction tasks. Boost- ing is a technique for improving the performance of a simple machine learning algorithm by repeatedly applying it to the training set with different example weightings. We describe an algorithm that learns simple, low-coverage wrapper-like extraction patterns, which we then apply to conventional in- formation extraction problems using boosting. The result is BWI, a trainable information extraction system with a strong precision bias and F1 performance better than state-of-the-art techniques in many domains.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • Dayne Freitag

  • Nicholas Kushmerick

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free