Expansive data, extensive model: Investigating discussion topics around LLM through unsupervised machine learning in academic papers and news

Hae Sun Jung; Haein Lee; Young Seok Woo; Seo Yeon Baek; Jang Hyun Kim

Journal ArticleOPEN ACCESS

Expansive data, extensive model: Investigating discussion topics around LLM through unsupervised machine learning in academic papers and news

PLoS ONE (2024) 19(5 May)

DOI: 10.1371/journal.pone.0304680

2Citations

33Readers

Get full text

Abstract

This study presents a comprehensive exploration of topic modeling methods tailored for large language model (LLM) using data obtained from Web of Science and LexisNexis from June 1, 2020, to December 31, 2023. The data collection process involved queries focusing on LLMs, including “Large language model,” “LLM,” and “ChatGPT.” Various topic modeling approaches were evaluated based on performance metrics, including diversity and coherence. latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), combined topic models (CTM), and bidirectional encoder representations from Transformers topic (BERTopic) were employed for performance evaluation. Evaluation metrics were computed across platforms, with BERTopic demonstrating superior performance in diversity and coherence across both LexisNexis and Web of Science. The experiment result reveals that news articles maintain a balanced coverage across various topics and mainly focus on efforts to utilize LLM in specialized domains. Conversely, research papers are more concise and concentrated on the technology itself, emphasizing technical aspects. Through the insights gained in this study, it becomes possible to investigate the future path and the challenges that LLMs should tackle. Additionally, they could offer considerable value to enterprises that utilize LLMs to deliver services.

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Jung, H. S., Lee, H., Woo, Y. S., Baek, S. Y., & Kim, J. H. (2024). Expansive data, extensive model: Investigating discussion topics around LLM through unsupervised machine learning in academic papers and news. PLoS ONE, 19(5 May). https://doi.org/10.1371/journal.pone.0304680

Readers' Seniority

Professor / Associate Prof. 4

50%

Researcher 3

38%

PhD / Post grad / Masters / Doc 1

13%

Readers' Discipline

Medicine and Dentistry 4

44%

Computer Science 2

22%

Business, Management and Accounting 2

22%

Engineering 1

11%

Article Metrics

Mentions

News Mentions: 1

Social Media

Shares, Likes & Comments: 36

View details >

Expansive data, extensive model: Investigating discussion topics around LLM through unsupervised machine learning in academic papers and news

Abstract

Cited by Powered by Scopus

Decoding Bitcoin: leveraging macro- and micro-factors in time series analysis for price prediction

Topic Modeling Analysis in the field of Large Language Models with BERTopic (2020-2024)

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline

Article Metrics