Historical topic modeling and semantic concepts exploration in a large corpus of unstructured text remains a hard, opened problem. Despite advancements in natural languages processing tools, statistical linguistics models, graph theory and visualization, there is no framework that combines these piece-wise tools under one roof. We designed and constructed a Semantic Network Analysis Pipeline (SNAP) that is available as an open-source web-service that implements work-flow needed by a data scientist to explore historical semantic concepts in a text corpus. We define a graph theoretic notion of a semantic concept as a flow of closely related tokens through the corpus of text. The modular work-flow pipeline processes text using natural language processing tools, statistical content narrowing, creates semantic networks from lexical token chaining, performs social network analysis of token networks and creates a 3D visualization of the semantic concept flows through corpus for interactive concept exploration. Finally, we illustrate the framework's utility to extract the information from a text corpus of Herman Melville's novel Moby Dick, the transcript of the 2015-2016 United States (U.S.) Senate Hearings on Environment and Public Works, and the Australian Broadcast Corporation's short news articles on rural and science topics.
CITATION STYLE
Cenek, M., Bulkow, R., Pak, E., Oyster, L., Ching, B., & Mulagada, A. (2019). Semantic network analysis pipeline-Interactive text mining framework for exploration of semantic flows in large corpus of text. Applied Sciences (Switzerland), 9(24). https://doi.org/10.3390/app9245302
Mendeley helps you to discover research relevant for your work.