Pressure Test: Finding Appropriate Data Size for Practice in Data Science Education

0Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data science, such as data analytics, data mining, machine learning, became one popular curriculum in information technology educations. The lectures on these topics cannot stand alone without coding practice on real-world data sets. Some instructors prefer to utilize small data sets for practice in classroom or assignments, which limits experimental experiences and may even bring misleading experiences to students. Others may try to assign large data sets to students, but students may not be able to bear with the running time due to the efficiency issue raised by several factors (e.g., data size, algorithm complexity, computing power, etc.). In this paper, we first learned students' preferences on the scalability of data sets for practice in data science courses, and performed experimental analysis by running different data science algorithms over both student laptops and personal/office computers, in order to deliver a suggestion about the appropriate data size for practice in multiple scenarios (e.g., in-class practice, assignments, class projects, research projects, etc.). We believe that our findings are valuable to help instructors prepare and assign real-world data sets to students in data science curriculum.

Cite

CITATION STYLE

APA

Zheng, Y., Liu, A., & Zheng, S. (2022). Pressure Test: Finding Appropriate Data Size for Practice in Data Science Education. In SIGITE 2022 - Proceedings of the 23rd Annual Conference on Information Technology Education (pp. 142–149). Association for Computing Machinery, Inc. https://doi.org/10.1145/3537674.3554748

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free