Discernment of Nativeness of English Documents Based on Statistical Hypothesis Testing

  • Tomiura Y
  • Aoki S
  • Shibata M
  • et al.
N/ACitations
Citations of this article
1Readers
Mendeley users who have this article in their library.

Abstract

This paper proposes a method to discern the nativeness of English documents with high precision based on Bayes decision and a statistical hypothesis testing. Regarding a document as a sequence of part-of-speeches, the proposed method makes a comparison between probabilities of a document by the statistical language model of native English and by that of non-native English to discern the nativeness of the document. The statistical language model used here is a n-gram model. The n-gram model with a large n can be expected to treat well the difference between the native English and the non-native one and has the potential to discern the nativeness with high precision. However, when we use the n-gram model with a large n, the zero frequency problem and the sparseness problem become acute and we cannot rely on the maximum likelihood estimates of n-gram probabilities. The proposed method estimates the ratio of the probability of the document by the native English language model to that by the non-native English language model using a statistical hypothesis testing. The experimental result shows that the proposed method discerns the nativeness with the precision 92.5%, which is significantly higher than by traditional methods.

Cite

CITATION STYLE

APA

Tomiura, Y., Aoki, S., Shibata, M., & Yukino, K. (2009). Discernment of Nativeness of English Documents Based on Statistical Hypothesis Testing. Journal of Natural Language Processing, 16(1), 25–46. https://doi.org/10.5715/jnlp.16.1_25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free