Designing a tool for data extraction from semi-structured and unstructured text, we are confronted with a problem that has largely been neglected by scholars so far: What if we need to find matches for several different patterns in a document and there are no keywords to support the search? And if so, what if the same section matches several different patterns or if matches in part overlap? How can we decide which one to pick? We suggest that this is an important problem in data extraction and propose a solution based on a token classification system and weighted finite-state automata. © 2012 The authors and IOS Press. All rights reserved.
CITATION STYLE
Broman, P., & Thalheim, B. (2012). Interactive data extraction from semi-structured text. Frontiers in Artificial Intelligence and Applications, 237, 1–19. https://doi.org/10.3233/978-1-60750-992-9-1
Mendeley helps you to discover research relevant for your work.