Semi-automatic term extraction for the African languages, with special reference to Northern Sotho

16Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

Worldwide, semi-automatically extracting terms from corpora is becoming the norm for the compilation of terminology lists, term banks or dictionaries for special purposes. If African-language terminologists are willing to take their rightful place in the new millennium, they must not only take cognisance of this trend but also be ready to implement the new technology. In this article it is advocated that the best way to do the latter two at this stage, is to opt for computationally straightforward alternatives (i.e. use 'raw corpora') and to make use of widely available software tools (e.g. WordSmith Tools). The main aim is therefore to discover whether or not the semi-automatic extraction of terminology from untagged and unmarked running text by means of basic corpus query software is feasible for the African languages. In order to answer this question a full-blown case study revolving around Northern Sotho linguistic texts is discussed in great detail. The computational results are compared throughout with the outcome of a manual excerption, and vice versa. Attention is given to the concepts 'recall' and 'precision'; different approaches are suggested for the treatment of single-word terms versus multi-word terms; and the various findings are summarised in a Linguistics Terminology lexicon presented as an Appendix.

Cite

CITATION STYLE

APA

Taljard, E., & De Schryver, G. M. (2002). Semi-automatic term extraction for the African languages, with special reference to Northern Sotho. Lexikos, 12, 44–74. https://doi.org/10.5788/12-0-760

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free