In this paper, we study the computational cost of extracting character n-grams from a corpus. We propose an approach for reducing this cost which is relevant especially for text mining and natural language applications. The underlying idea is to take under consideration only n-grams occurring above a given frequency in a corpus. This approach is applied to three different corpora, allowing the extraction of all frequent n-grams in those corpora in reasonable time. © Springer-Verlag Berlin Heidelberg 2003.
CITATION STYLE
Marques, N. C., & Braud, A. (2003). Mining generalized character n-grams in large corpora. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2902, 419–423. https://doi.org/10.1007/978-3-540-24580-3_48
Mendeley helps you to discover research relevant for your work.