Abstract
One frustration of data scientists and statisticians is moving between programming languages to complete projects. The most common two are R and Python. For example, a survival analysis model may be fit using R's survival-package (Terry M Therneau, 2015) or flexsurv (Christopher Jackson, 2016), but needs to be deployed into a Python system. Previously, this may have meant using Python libraries to call out to R (still shuffling between two languages, but now abstracted), or translating the fitted model to Python (likely to introduce bugs). Libraries like Patsy (Nathaniel J. Smith et al., 2018) and Statsmodels (Skipper Seabold et al., 2017) have helped data scientists and statisticians work in solely in Python. lifelines extends the toolbox of data scientists so they can perform common survival analysis tasks in Python. Its value comes from its intuitive and well documented API, its flexibility in modeling novel hazard functions, and its easy deployment in production systems & research stations along side other Python libraries. The internals of lifelines uses some novel approaches to survival analysis algorithms like automatic differentiation and meta-algorithms. We present high-level descriptions of these novel approaches next. One goal of lifelines is to be pure Python so as to make installation and maintenance simple. This can be at odds with users' desire for high-performance model fitting. Though Python is becoming more and more performant, datasets are getting larger and larger at a faster rate. Internally, lifelines uses some interesting tricks to improve performance. These approaches can be applied to other Python libraries. For example, the Cox proportional hazard model with Efron's tie-handling method has a complicated partial-likelihood (Wikipedia contributors", 2019): ℓ(β) = ∑ j ∑ i∈Hj X i · β − m−1 ∑ ℓ=0 log ∑ i:Yi≥tj θ i − ℓ m ∑ i∈Hj θ i , where θ i = exp (X i · β), and the Hessian matrix is: ℓ ′′ (β) = − ∑ j m−1 ∑ ℓ=0 (∑ i:Yi≥tj θ i X i X ′ i − ℓ m ∑ i∈Hj θ i X i X ′ i ϕ j,ℓ,m − Z j,ℓ,m Z ′ j,ℓ,m ϕ 2 j,ℓ,m) , where ϕ j,ℓ,m = ∑ i:Yi≥tj θ i − ℓ m ∑ i∈Hj θ i Z j,ℓ,m = ∑ i:Yi≥tj θ i X i − ℓ m ∑ i∈Hj θ i X i .
Cite
CITATION STYLE
Davidson-Pilon, C. (2019). lifelines: survival analysis in Python. Journal of Open Source Software, 4(40), 1317. https://doi.org/10.21105/joss.01317
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.