Mining generalized character n-grams in large corpora

Nuno C. Marques; Agnès Braud

Journal Article

Mining generalized character n-grams in large corpora

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2003) 2902 419-423

DOI: 10.1007/978-3-540-24580-3_48

0Citations

5Readers

Get full text

Abstract

In this paper, we study the computational cost of extracting character n-grams from a corpus. We propose an approach for reducing this cost which is relevant especially for text mining and natural language applications. The underlying idea is to take under consideration only n-grams occurring above a given frequency in a corpus. This approach is applied to three different corpora, allowing the extraction of all frequent n-grams in those corpora in reasonable time. © Springer-Verlag Berlin Heidelberg 2003.

Author supplied keywords

Cite

CITATION STYLE

APA

Marques, N. C., & Braud, A. (2003). Mining generalized character n-grams in large corpora. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2902, 419–423. https://doi.org/10.1007/978-3-540-24580-3_48

Mining generalized character n-grams in large corpora

Abstract

Author supplied keywords

Cite

Register to see more suggestions