This chapter will describe the basics for text processing and give an overview of standard methods or techniques: Preprocessing of texts such as tokenisation and text segmentation. Word processing such as morphological processing, lemmatisation, stemming, compound splitting, abbreviation detection and expansion. Sentence based methods such as part-of-speech tagging, syntactical analysis or parsing, semantic analysis such as named entity recognition, negation detection, relation extraction, temporal processing and anaphora resolution. Generally, the same building blocks used for regular texts can also be utilised for clinical text processing. However, clinical texts contain more noise in the form of incomplete sentences, misspelled words and non-standard abbreviations that can make the natural language processing cumbersome. For more details on the concepts in this section, see the following comprehensible textbooks in computational linguistics: Mitkov (2005), Jurafsky and Martin (2014) and Clark et al. (2013). 7.1 Definitions Natural language processing (NLP) is the traditional term for intelligent text processing where a computer program tries to interpret what is written in natural language text or speech using computational linguistic methods. Other common terms för NLP are computational linguistics, language engineering or language technology. Information retrieval (IR) may use NLP methods, but the aim with IR is to find a specific document in a document collection, while information extraction (IE) is to find specific information in a document or in a document collection. A popular term today is text mining, which means to find previously unknown facts in a text collection or to build a hypothesis that later is to be proven. Text mining is used in a broad sense in the literature sometimes meaning the use of machine learning-based
CITATION STYLE
Dalianis, H. (2018). Basic Building Blocks for Clinical Text Processing. In Clinical Text Mining (pp. 55–82). Springer International Publishing. https://doi.org/10.1007/978-3-319-78503-5_7
Mendeley helps you to discover research relevant for your work.