Comparing Clustering Techniques on Brazilian Legal Document Datasets

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The Brazilian justice system is one of the largest and most virtualized in the world. However, the country suffers from severe system congestion. Clustering is a machine learning technique that can increase the speed of the system by automating daily tasks in addition to helping to ensure legal certainty. However, choosing the right model for clustering is not a simple task. This work makes an empirical evaluation of six different approaches to the clustering of Brazilian legal documents. K-Means, Mini Batch K-Means and HDBSCAN algorithms were tested in different hyperparameters and together or not with the Kohonen’s Map as a pre-clustering technique. The work also proposes a new NLP-oriented framework for the specific evaluation of textual clusters. Two databases of Brazilian legal documents were used. The results demonstrate that K-means and Mini Batch K-Means are the best choices, and the use of the Kohonen’s map can increase the overall clustering performance.

Cite

CITATION STYLE

APA

Lima, J. P., & Costa, J. A. (2022). Comparing Clustering Techniques on Brazilian Legal Document Datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13469 LNAI, pp. 98–110). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-15471-3_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free