Think Before You Classify: The Rise of Reasoning Large Language Models for Consumer Complaint Detection and Classification

7Citations
Citations of this article
57Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in various natural language processing (NLP) tasks, but their effectiveness in real-world consumer complaint classification without fine-tuning remains uncertain. Zero-shot classification offers a promising solution by enabling models to categorize consumer complaints without prior exposure to labeled training data, making it valuable for handling emerging issues and dynamic complaint categories in finance. However, this task is particularly challenging, as financial complaint categories often overlap, requiring a deep understanding of nuanced language. In this study, we evaluate the zero-shot classification performance of leading LLMs and reasoning models, totaling 14 models. Specifically, we assess DeepSeek-V3, Gemini-2.0-Flash, Gemini-1.5-Pro, Anthropic’s Claude 3.5 and 3.7 Sonnet, Claude 3.5 Haiku, and OpenAI’s GPT-4o, GPT-4.5, and GPT-4o Mini, alongside reasoning models such as DeepSeek-R1, o1, and o3. Unlike traditional LLMs, reasoning models are specifically trained with reinforcement learning to exhibit advanced inferential capabilities, structured decision-making, and complex reasoning, making their application to text classification a groundbreaking advancement. The models were tasked with classifying consumer complaints submitted to the Consumer Financial Protection Bureau (CFPB) into five predefined financial classes based solely on complaint text. Performance was measured using accuracy, precision, recall, F1-score, and heatmaps to identify classification patterns. The findings highlight the strengths and limitations of both standard LLMs and reasoning models in financial text processing, providing valuable insights into their practical applications. By integrating reasoning models into classification workflows, organizations may enhance complaint resolution automation and improve customer service efficiency, marking a significant step forward in AI-driven financial text analysis.

Cite

CITATION STYLE

APA

Roumeliotis, K. I., Tselikas, N. D., & Nasiopoulos, D. K. (2025). Think Before You Classify: The Rise of Reasoning Large Language Models for Consumer Complaint Detection and Classification. Electronics (Switzerland), 14(6). https://doi.org/10.3390/electronics14061070

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free