This chapter is an introduction to parallel processing with education data. As the amount of education data continues to grow, new methods for processing this data efficiently are required. This chapter gives a history of popular parallel computing frameworks and discusses problem types that are easily mapped to these frameworks. Following that, an example machine-learning problem is described and a single-threaded and parallel pipeline using Apache Spark are compared. We hope this information can be used by other practitioners looking to utilize Apache Spark to expand their models to include more students and more data.
CITATION STYLE
Lewkow, N., & Feild, J. (2018). Using Apache Spark for modeling student behavior at scale. In Lecture Notes in Educational Technology (pp. 169–176). Springer International Publishing. https://doi.org/10.1007/978-981-13-0650-1_9
Mendeley helps you to discover research relevant for your work.