Orthographic measures of language distances between the official South African languages

  • Zulu P
  • Botha G
  • Barnard E
N/ACitations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Two methods for objectively measuring similarities and dissimilarities between the eleven official languages of South Africa are described. The first concerns the use of n-grams. The confusions between different languages in a text-based language identification system can be used to derive information on the relationships between the languages. Our classifier calculates n-gram statistics from text documents and then uses these statistics as features in classification. We show that the classification results of a validation test can be used as a similarity measure of the relationship between languages. Using the similarity measures, we were able to represent the relationships graphically. We also apply the Levenshtein distance measure to the orthographic word transcriptions from the eleven South African languages under investigation. Hierarchical clustering of the distances between the different languages shows the relationships between the languages in terms of regional groupings and closeness. Both multidimensional scaling and dendrogram analysis reveal results similar to well-known language groupings, and also suggest a finer level of detail on these relationships.

Cite

CITATION STYLE

APA

Zulu, P. N., Botha, G., & Barnard, E. (2008). Orthographic measures of language distances between the official South African languages. Literator, 29(1). https://doi.org/10.4102/lit.v29i1.106

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free