Comparing Clustering Techniques on Brazilian Legal Document Datasets

João Pedro Lima; José Alfredo Costa

Conference Proceedings

Comparing Clustering Techniques on Brazilian Legal Document Datasets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13469 LNAI 98-110

DOI: 10.1007/978-3-031-15471-3_9

0Citations

1Readers

Get full text

Abstract

The Brazilian justice system is one of the largest and most virtualized in the world. However, the country suffers from severe system congestion. Clustering is a machine learning technique that can increase the speed of the system by automating daily tasks in addition to helping to ensure legal certainty. However, choosing the right model for clustering is not a simple task. This work makes an empirical evaluation of six different approaches to the clustering of Brazilian legal documents. K-Means, Mini Batch K-Means and HDBSCAN algorithms were tested in different hyperparameters and together or not with the Kohonen’s Map as a pre-clustering technique. The work also proposes a new NLP-oriented framework for the specific evaluation of textual clusters. Two databases of Brazilian legal documents were used. The results demonstrate that K-means and Mini Batch K-Means are the best choices, and the use of the Kohonen’s map can increase the overall clustering performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Lima, J. P., & Costa, J. A. (2022). Comparing Clustering Techniques on Brazilian Legal Document Datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13469 LNAI, pp. 98–110). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-15471-3_9

Comparing Clustering Techniques on Brazilian Legal Document Datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions