Efficient n-gram, Skipgram and Flexgram Modelling with Colibri Core

  • Van Gompel M
  • Van den Bosch A
N/ACitations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

Counting n-grams lies at the core of any frequentist corpus analysis and is often considered a trivial matter. Going beyond consecutive n-grams to patterns such as skipgrams and flexgrams increases the demand for efficient solutions. The need to operate on big corpus data does so even more. Lossless compression and non-trivial algorithms are needed to lower the memory demands, yet retain good speed. Colibri Core is software for the efficient computation and querying of n-grams, skipgrams and flexgrams from corpus data. The resulting pattern models can be analysed and compared in various ways. The software offers a programming library for C++ and Python, as well as command-line tools.

Cite

CITATION STYLE

APA

Van Gompel, M., & Van den Bosch, A. (2016). Efficient n-gram, Skipgram and Flexgram Modelling with Colibri Core. Journal of Open Research Software, 4(1), 30. https://doi.org/10.5334/jors.105

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free