Automatic Word Embeddings-Based Glossary Term Extraction from Large-Sized Software Requirements

10Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

[Context and Motivation] Requirements glossary defines specialized and technical terms used in a requirements document. A requirements glossary helps in improving the quality and understandability of requirements documents. [Question/Problem] Manual extraction of glossary terms from a large body of requirements is an expensive and time-consuming task. This paper proposes a fundamentally new approach for automated extraction of glossary terms from large-sized requirements documents. [Principal Ideas/Result] Firstly, our technique extracts the candidate glossary terms by applying text chunking. Next, we apply a novel word embeddings based semantic filter for reducing the number of candidate glossary terms. Since word embeddings are very effective in identifying terms that are semantically very similar, this filter ensures that only domain-specific terms are present in the final set of glossary terms. We create a domain-specific reference corpus for home automation by Wikipedia crawling and use it for computing the semantic similarity scores of candidate glossary terms. We apply our technique to a large-sized requirements document, i.e., a CrowdRE dataset with around 3000 crowd-generated requirements for smart home applications. Semantic filtering reduces the number of glossary terms by 92.7%. To evaluate the quality of our extracted glossary terms we manually create the ground truth data from CrowdRE dataset and use it for computing precision and recall. Additionally, we also compute the requirements coverage of these extracted glossary terms. [Contributions] Our detailed experiments show that word embeddings based semantic filtering can be very useful for extracting glossary terms from a large body of requirements.

Cite

CITATION STYLE

APA

Mishra, S., & Sharma, A. (2020). Automatic Word Embeddings-Based Glossary Term Extraction from Large-Sized Software Requirements. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12045 LNCS, pp. 203–218). Springer. https://doi.org/10.1007/978-3-030-44429-7_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free