Mining career paths from large resume databases

12Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The emergence of online professional platforms, such as LinkedIn and Indeed, has led to unprecedented volumes of rich resume data that have revolutionized the study of careers. One of the most prevalent problems in this space is the extraction of prototype career paths from a workforce. Previous research has consistently relied on a two-step approach to tackle this problem. The first step computes the pairwise distances between all the career sequences in the database. The second step uses the distance matrix to create clusters, with each cluster representing a different prototype path. As we demonstrate in this work, this approach faces two significant challenges when applied on large resume databases. First, the overwhelming diversity of job titles in the modern workforce prevents the accurate evaluation of distance between career sequences. Second, the clustering step of the standard approach leads to highly heterogeneous clusters, due to its inability to handle categorical sequences and sensitivity to outliers. This leads to non-representative centroids and spurious prototype paths that do not accurately represent the actual groups in the workforce. Our work addresses these two challenges and has practical implications for the numerous researchers and practitioners working on the analysis of career data across domains.

Cite

CITATION STYLE

APA

Lappas, T. (2020). Mining career paths from large resume databases. ACM Transactions on Knowledge Discovery from Data, 14(3). https://doi.org/10.1145/3379984

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free