DETEXA: Declarative Extensible Text Exploration and Analysis

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Metadata enrichment through text mining techniques is becoming one of the most significant tasks in digital libraries. Due to the pandemic increase of open access publications, several new challenges have emerged. Raw data are usually big, unstructured, and come from heterogeneous data sources. In this paper, we introduce a text analysis framework which is implemented in extended SQL and exploits the scalability characteristics of modern database management systems. The purpose of this framework is to provide the opportunity to build performant end-to-end text mining pipelines which includes data harvesting, cleaning, processing, and text analysis at once. SQL is selected due to its declarative nature which offers fast experimentation and the ability to build APIs, so that domain experts can edit text mining workflows via easy-to-use graphical interfaces. Our experimental analysis demonstrates that the proposed framework is very effective and achieves significant speedup in common use cases compared to other popular approaches.

Cite

CITATION STYLE

APA

Foufoulas, Y., Zacharia, E., Dimitropoulos, H., Manola, N., & Ioannidis, Y. (2022). DETEXA: Declarative Extensible Text Exploration and Analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13541 LNCS, pp. 107–119). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16802-4_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free