Abstract
Background: The study aimed to develop interpretable machine learning models for the identification of idiopathic central precocious puberty (ICPP) in girls, without the need for the expensive and time-consuming gonadotropin-releasing hormone (GnRH) stimulation test, which is currently the gold standard for diagnosing ICPP. Methods: A total of 246 female paediatric patients who had secondary sexual characteristics before 8 years old and had taken a GnRH stimulation test were randomly divided into a training set (172 patients, 70%) and a validation set (74 patients, 30%). Characteristic parameters were extracted from easily available clinical data and were statistically analysed. The least absolute shrinkage and selection operator (LASSO) method was used to select essential characteristic parameters associated with ICPP and were used to construct logistic regression (LR) and five machine learning (ML) models, including support vector machine (SVM), Gaussian naive bayes (GaussianNB), extreme gradient boosting (XGBoost), random forest (RF), and k- nearest neighbor algorithm (kNN). Then, the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, false positive and negative values, Youden’s index, accuracy, positive and negative likelihood ratios, calibration plots, and decision curve analysis (DCA) were used to evaluate the models’ effectiveness. Finally, the shapley additive explanations (SHAP) package was used to interpret the best-performing model. Results: Four essential characteristic parameters, namely uterine volume, bone age/chronological age (BA/CA), basal follicle-stimulating hormone (FSH), and basal luteinizing hormone (LH), were selected using the LASSO method. Based on these characteristic parameters, the LR and five machine learning models achieved AUC values ranging from 0.72 to 0.96 in the training set and AUC values ranging from 0.65 to 0.90 in the validation set for diagnosing ICPP. Among the LR and five machine learning models, the XGBoost model demonstrated superior performance, achieving the highest AUC values, accuracy, specificity, and sensitivity in both the training and validation sets. Moreover, calibration plots and DCA confirmed that this model exhibited the best calibration and clinical utility. Conclusions: An accurate and interpretable ML-based model has been developed to aid clinicians in the diagnosis of ICPP, assisting in clinical decision-making.
Author supplied keywords
Cite
CITATION STYLE
Tian, L., Zeng, Y., Zheng, H., & Cai, J. (2025). Interpretable XGBoost model identifies idiopathic central precocious puberty in girls using four clinical and imaging features. BMC Endocrine Disorders, 25(1). https://doi.org/10.1186/s12902-025-01983-4
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.