An Attempt to Use Weighted Cusums to Identify Sublanguages

10Citations
Citations of this article
79Readers
Mendeley users who have this article in their library.

Abstract

This paper explores the use of weighted cusums, a technique foiuid in authorship attribution studies, for the purpose of identifying sublanguages. The technique, and its relation to standard cusums (cumulative sum charts) is first described, and the formulae for calculations given in detail. The technique compares texts by testing for the incidence of linguistic 'features' of a superficial nature, e.g. proportion of 2- and 3-letter words, words beginning with a vowel, and.so on, and measures whether two texts differ significantly in respect of these features. The paper describes an experiment in which 14 groups of three texts each representing different sublanguages are compared with each other using the technique. The texts are first compared within each group to establish that the technique can identify the groups as being homogeneous. The texts are then compared with each other, and the results analysed. Taking the average of seven different tests, the technique is able to distinguish the sublanguages in only 43% of the case. But if the best score is taken, 79% of pairings can be distinguished. This is a better result, and the test seems able to quantify the difference between sublanguages.

Cite

CITATION STYLE

APA

Somers, H. (1998). An Attempt to Use Weighted Cusums to Identify Sublanguages. In Proceedings of the Joint Conference on New Methods in Language Processing and Computational Natural Language Learning, NeMLaP/CoNLL 1998 (pp. 131–139). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1603899.1603922

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free