Mining career paths from large resume databases

Theodoros Lappas

Journal ArticleOPEN ACCESS

Mining career paths from large resume databases

Lappas T

ACM Transactions on Knowledge Discovery from Data (2020) 14(3)

DOI: 10.1145/3379984

12Citations

24Readers

Get full text

Abstract

The emergence of online professional platforms, such as LinkedIn and Indeed, has led to unprecedented volumes of rich resume data that have revolutionized the study of careers. One of the most prevalent problems in this space is the extraction of prototype career paths from a workforce. Previous research has consistently relied on a two-step approach to tackle this problem. The first step computes the pairwise distances between all the career sequences in the database. The second step uses the distance matrix to create clusters, with each cluster representing a different prototype path. As we demonstrate in this work, this approach faces two significant challenges when applied on large resume databases. First, the overwhelming diversity of job titles in the modern workforce prevents the accurate evaluation of distance between career sequences. Second, the clustering step of the standard approach leads to highly heterogeneous clusters, due to its inability to handle categorical sequences and sensitivity to outliers. This leads to non-representative centroids and spurious prototype paths that do not accurately represent the actual groups in the workforce. Our work addresses these two challenges and has practical implications for the numerous researchers and practitioners working on the analysis of career data across domains.

Author supplied keywords

Cite

CITATION STYLE

APA

Lappas, T. (2020). Mining career paths from large resume databases. ACM Transactions on Knowledge Discovery from Data, 14(3). https://doi.org/10.1145/3379984

Mining career paths from large resume databases

Abstract

Author supplied keywords

Cite

Register to see more suggestions