This paper reports on a user-friendly terminology and information extraction development environment that integrates into existing infrastructure for natural language processing and aims to close a gap in the UIMA community. The tool supports domain experts in data-driven and manual terminology refinement and refactoring. It can propose new concepts and simple relations and includes an information extraction algorithm that considers the context of terms for disambiguation. With its tight integration of easy-to-use and technical tools for component development and resource management, the system is especially designed to shorten times necessary for domain adaptation of such text processing components. Search support provided by the tool fosters this aspect and is helpful for building natural language processing modules in general. Specialized queries are included to speed up several tasks, for example, the detection of new terms and concepts, or simple quality estimation without gold standard documents. The development environment is modular and extensible by using Eclipse and the Apache UIMA framework. This paper describes the system's architecture and features with a focus on search support. Notably, this paper proposes a generic middleware component for queries in a UIMA based workbench.
CITATION STYLE
Toepfer, M., Fette, G., Beck, P. D., Kluegl, P., & Puppe, F. (2014). Integrated Tools for Query-driven Development of Light-weight Ontologies and Information Extraction Components. In Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT, OIAF4HLT 2014 - Held at the 25th International Conference on Computational Linguistics, COLING 2014 (pp. 83–92). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-5210
Mendeley helps you to discover research relevant for your work.