Computational Methods for Text Analysis and Text Classification

  • Dalianis H
N/ACitations
Citations of this article
21Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this chapter the differences between rule-based systems and machine learning-based systems along with their respective pros and cons will be explained. The principles of machine learning-based systems such as Conditional Random Fields (CRF), Support Vector Machines (SVM) and the Weka toolkit supporting several machine learning algorithms and evaluation packages will be presented. For machine learning feature extraction for improving the machine learning results will be described, feature extraction such as POS-tagging, stemming and lemmatisation, as well as statistical calculations based on tf-idf to filter out relevant words. Active learning is used for selecting the optimal data to be annotated. Different machine learning approaches such as topic modelling, distributional semantics and clustering will be presented. Text is preprocessed into different knowledge representations such as vector space model and word space model etc. These representations are adapted for different computational methods. The results produced from both rule-based and machine learning-based systems will be explained. Ready computational linguistic modules for English clinical text mining, such as MedLEE and cTakes will also be presented, as well as some basic tools such as NLTK and GATE, which need to be adapted to clinical text mining. 8.1 Rule-Based Methods The rule-based method is the classical programming paradigm. A human programmer or software engineer writes rules to mimic the required behavior of a program. The programmer studies a flow chart of how the program should react depending on the, input data to the program. The programmer may also study the input data and the required output data and try to implement this in the program. The rules can be any type of format, a grammar for parsing text, regular expressions to extract parts of

Cite

CITATION STYLE

APA

Dalianis, H. (2018). Computational Methods for Text Analysis and Text Classification. In Clinical Text Mining (pp. 83–96). Springer International Publishing. https://doi.org/10.1007/978-3-319-78503-5_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free