Distantly Supervised Course Concept Extraction in MOOCs with Academic Discipline

9Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the rapid growth of Massive Open Online Courses (MOOCs), it is expensive and time-consuming to extract high-quality knowledgeable concepts taught in the course by human effort to help learners grasp the essence of the course. In this paper, we propose to automatically extract course concepts using distant supervision to eliminate the heavy work of human annotations, which generates labels by matching them with an easily accessed dictionary. However, this matching process suffers from severe noisy and incomplete annotations because of the limited dictionary and diverse MOOCs. To tackle these challenges, we present a novel three-stage framework DS-MOCE, which leverages the power of pre-trained language models explicitly and implicitly and employs discipline-embedding models with a self-train strategy based on label generation refinement across different domains. We also provide an expert-labeled dataset spanning 20 academic disciplines. Experimental results demonstrate the superiority of DS-MOCE over the state-of-the-art distantly supervised methods (with 7% absolute F1 score improvement). Code and data are now available at https://github.com/THU-KEG/MOOC-NER.

Cite

CITATION STYLE

APA

Lu, M., Wang, Y., Yu, J., Du, Y., Hou, L., & Li, J. (2023). Distantly Supervised Course Concept Extraction in MOOCs with Academic Discipline. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 13044–13059). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.729

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free