Printed romanian modelling: A corpus linguistics based study with orthography and punctuation marks included

10Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper is part of a larger study dedicated by the authors to the description of printed Romanian language as an information source. Here, the statistical investigation attempts to get an answer concerning the mathematical model of the language with orthography and punctuation marks included into the alphabet. To come out to an accurate result, the authors processed the information obtained out of multiple data sets sampled from a corpus linguistics, by using the following statistical inferences: probability estimation with multiple confidence intervals, test of the hypothesis that the probability belongs to an interval, and test of the equality between two probabilities. The second type statistical error probability involved in the tests was considered. The experimental results, which are new for printed Romanian, refer to the letter, digram and trigram statistical structure in a corpus linguistics of 93 books (about 50 millions characters). © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Vlad, A., Mitrea, A., & Mitrea, M. (2007). Printed romanian modelling: A corpus linguistics based study with orthography and punctuation marks included. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4705 LNCS, pp. 409–423). Springer Verlag. https://doi.org/10.1007/978-3-540-74472-6_33

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free