lifelines: survival analysis in Python

  • Davidson-Pilon C
N/ACitations
Citations of this article
248Readers
Mendeley users who have this article in their library.

Abstract

One frustration of data scientists and statisticians is moving between programming languages to complete projects. The most common two are R and Python. For example, a survival analysis model may be fit using R's survival-package (Terry M Therneau, 2015) or flexsurv (Christopher Jackson, 2016), but needs to be deployed into a Python system. Previously, this may have meant using Python libraries to call out to R (still shuffling between two languages, but now abstracted), or translating the fitted model to Python (likely to introduce bugs). Libraries like Patsy (Nathaniel J. Smith et al., 2018) and Statsmodels (Skipper Seabold et al., 2017) have helped data scientists and statisticians work in solely in Python. lifelines extends the toolbox of data scientists so they can perform common survival analysis tasks in Python. Its value comes from its intuitive and well documented API, its flexibility in modeling novel hazard functions, and its easy deployment in production systems & research stations along side other Python libraries. The internals of lifelines uses some novel approaches to survival analysis algorithms like automatic differentiation and meta-algorithms. We present high-level descriptions of these novel approaches next. One goal of lifelines is to be pure Python so as to make installation and maintenance simple. This can be at odds with users' desire for high-performance model fitting. Though Python is becoming more and more performant, datasets are getting larger and larger at a faster rate. Internally, lifelines uses some interesting tricks to improve performance. These approaches can be applied to other Python libraries. For example, the Cox proportional hazard model with Efron's tie-handling method has a complicated partial-likelihood (Wikipedia contributors", 2019): ℓ(β) = ∑ j   ∑ i∈Hj X i · β − m−1 ∑ ℓ=0 log   ∑ i:Yi≥tj θ i − ℓ m ∑ i∈Hj θ i     , where θ i = exp (X i · β), and the Hessian matrix is: ℓ ′′ (β) = − ∑ j m−1 ∑ ℓ=0 (∑ i:Yi≥tj θ i X i X ′ i − ℓ m ∑ i∈Hj θ i X i X ′ i ϕ j,ℓ,m − Z j,ℓ,m Z ′ j,ℓ,m ϕ 2 j,ℓ,m) , where ϕ j,ℓ,m = ∑ i:Yi≥tj θ i − ℓ m ∑ i∈Hj θ i Z j,ℓ,m = ∑ i:Yi≥tj θ i X i − ℓ m ∑ i∈Hj θ i X i .

Cite

CITATION STYLE

APA

Davidson-Pilon, C. (2019). lifelines: survival analysis in Python. Journal of Open Source Software, 4(40), 1317. https://doi.org/10.21105/joss.01317

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free