Abstract
Highlights: What are the main findings? Demonstrates the effectiveness of a novel multitask learning (MTL) framework utilizing large language models (LLMs) for real-time analysis of road traffic crashes (RTCs) through the integration of social media data. Fine-tuning GPT-2 for language modeling demonstrated that it outperformed baseline models, including GPT-4o mini in zero-shot mode and XGBoost, across various classification and information retrieval tasks. This study benchmarks the performance of the fine-tuned GPT-2 model against these baselines, highlighting its superior performance in these tasks. The study collected and curated a dataset of 26,226 RTC-related tweets from Australia over a year. This dataset extracted fifteen unique features, with six used in classification tasks and nine in information retrieval tasks. Developed an advanced automated labeling system using GPT-3.5, followed by rigorous expert verification to ensure the accuracy and reliability of feature extraction from tweets. The resulting meticulously curated dataset serves as a foundational resource for training and validating subsequent models, establishing a new standard for RTC analysis. What is the implication of the main finding? Offers a transformative approach to traffic safety analytics, providing detailed, timely insights crucial for emergency responders, urban planners, and policymakers. By leveraging cutting-edge AI techniques within an MTL framework, this study demonstrates a transformative approach to real-time RTC analysis, setting the stage for future advancements in the field. The curated dataset generated in this research not only advances traffic safety measures but also serves as a valuable resource for extracting insights, developing models, and conducting further research. This resource provides a solid foundation for future studies aimed at enhancing road safety. Road traffic crashes (RTCs) are a global public health issue, with traditional analysis methods often hindered by delays and incomplete data. Leveraging social media for real-time traffic safety analysis offers a promising alternative, yet effective frameworks for this integration are scarce. This study introduces a novel multitask learning (MTL) framework utilizing large language models (LLMs) to analyze RTC-related tweets from Australia. We collected 26,226 traffic-related tweets from May 2022 to May 2023. Using GPT-3.5, we extracted fifteen distinct features categorized into six classification tasks and nine information retrieval tasks. These features were then used to fine-tune GPT-2 for language modeling, which outperformed baseline models, including GPT-4o mini in zero-shot mode and XGBoost, across most tasks. Unlike traditional single-task classifiers that may miss critical details, our MTL approach simultaneously classifies RTC-related tweets and extracts detailed information in natural language. Our fine-tunedGPT-2 model achieved an average accuracy of 85% across the six classification tasks, surpassing the baseline GPT-4o mini model’s 64% and XGBoost’s 83.5%. In information retrieval tasks, our fine-tuned GPT-2 model achieved a BLEU-4 score of 0.22, a ROUGE-I score of 0.78, and a WER of 0.30, significantly outperforming the baseline GPT-4 mini model’s BLEU-4 score of 0.0674, ROUGE-I score of 0.2992, and WER of 2.0715. These results demonstrate the efficacy of our fine-tuned GPT-2 model in enhancing both classification and information retrieval, offering valuable insights for data-driven decision-making to improve road safety. This study is the first to explicitly apply social media data and LLMs within an MTL framework to enhance traffic safety.
Author supplied keywords
Cite
CITATION STYLE
Jaradat, S., Nayak, R., Paz, A., Ashqar, H. I., & Elhenawy, M. (2024). Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data. Smart Cities, 7(5), 2422–2465. https://doi.org/10.3390/smartcities7050095
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.