Identifying document relevance to Sustainable Development Goals using NLG

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Artificial Intelligence (AI) and, specifically, Natural Language Processing (NLP) techniques are considered as catalyzers of sustainable development of human society by providing information technology support for attainment of targets of Sustainable Development Goals (SDGs). The current study aims at investigating applicability of language generative models for identifying representation of SDGs in scientific publications indexed by Scopus database. The study is an initial step in developing an NLP-based framework for evaluation of attainment of SDGs based on documents written in human language. Given that SDGs are articulated in natural language in sentences of different length, comparison of their descriptions with summaries of text documents is expected to identify and quantify relevance of documents to each SDG and its targets in a more comprehensive way compared to the traditional keyword search. The study is based on abstractive summarization and follows the methodological framework presented in Figure 1. Neural language models developed based on Transformer architecture are available as open source software. Several model implementations were selected from the library of HuggingFace platform. The models were evaluated based on the required computational resources and their performance on a small corpus of documents. Two models, namely, BART and T5, were selected for the study due to their relatively low computational cost. These models are pre-trained on large corpora of general purpose text documents. Given multifaceted nature of the Sustainability concept that covers the most important issues of human society, the assumption was made that the models are suitable for text-generation NLP tasks in the selected problem domain. To evaluate text similarity standard NLP similarity measures are not sufficient for accurate extraction of semantics of the texts. Word2Vec similarity was chosen to evaluate relevance of the selected documents to the target texts. To ensure credibility of the documents, 988 peer-reviewed scientific publications which appeared over the period from January 2022 to May 2023 were extracted from Scopus Elsevier database. Each SDG short description was downloaded from the United Nations web site. Analysis of authors’ keywords confirmed that topics dominating in the corpus are relevant to the sustainability domain and cover all three pillars of the Sustainability concept: environment, economy, and society. Abstracts of extracted papers formed the corpus of investigated peer-reviewed publications. Each abstract was fed into two selected models for summarization. For each obtained summary, its similarity to each of the 17 SDGs was calculated based on Word2Vec score. The obtained similarity scores were further analysed using standard quantitative methods of aggregation. The results of the quantitative analysis led to conclusions on SDGs representation in recent scientific publications. The study uncovered that all 17 SDGs found their reflection in the recent scientific publications. Goal 17 is the most representative SDG, while Goal 16 was not in the focus of the recent studies. The study resulted in an approach to automatic processing of a corpus of text documents aiming to identify relevance of documents to SDGs using quantitative analysis of semantic similarity scores. The approach can be applied to different problem domains and corpora of documents extracted from various sources provided that corresponding target texts are available.

Cite

CITATION STYLE

APA

Erechtchoukova, M. G., & Safwat, N. (2023). Identifying document relevance to Sustainable Development Goals using NLG. In Proceedings of the International Congress on Modelling and Simulation, MODSIM (pp. 225–231). Modelling and Simulation Society of Australia and New Zealand Inc. (MSSANZ). https://doi.org/10.36334/modsim.2023.erechtchoukova

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free