Traffic Accident Severity Prediction Based on Data Cleaning and Machine Learning (Random Forest / Xgboost)

  • Zhou W
N/ACitations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

Traffic accidents have increasingly become a global concern, significantly affecting lives and economic sustainability. The US Accidents dataset from 2016 to 2023 provides an extensive record of accidents across the United States, containing detailed data and environmental data of these accident. This study aims to harness the potential of this rich database to predict severity level of accidents. Our research predominantly revolved around meticulous data cleaning, ensuring that the dataset's integrity was uncompromised. After preprocessing, the cleaned data was subjected to sophisticated Machine Learning techniques, primarily focusing on the Random Forest and XGBoost algorithms. These models were chosen due to their renowned capability in handling complex datasets and rendering accurate predictions, especially in scenarios laden with multiple variables. Upon application, the models demonstrated impressive efficacy. To validate the reliability and performance of our models, we employed the confusion matrix. This tool provided a clear visualization of the models' accuracy, revealing true positives, false negatives, and other crucial metrics. Furthermore, to enhance prediction outcomes, the Voting Classifier was implemented, combining the strengths of our primary models and consequently elevating the overall accuracy. The Random Forest algorithm exhibited substantial precision, while XGBoost further enhanced prediction accuracy. These findings underline the significant role of advanced data analytics and Machine Learning in comprehending traffic accident dynamics. In conclusion, our study emphasizes that leveraging state-of-the-art Machine Learning techniques on well-curated datasets can substantially improve our understanding and prediction of traffic accident severity. Such insights pave the way for the development of more effective preventive measures and safety protocols, aiming for a safer traffic environment in the future.

Cite

CITATION STYLE

APA

Zhou, W. (2024). Traffic Accident Severity Prediction Based on Data Cleaning and Machine Learning (Random Forest / Xgboost). Highlights in Science, Engineering and Technology, 85, 376–388. https://doi.org/10.54097/h8cq6864

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free