Machine Learning for Higher-Level Linguistic Tasks

  • Rumshisky A
  • Stubbs A
N/ACitations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Annotation is one of the main vehicles for supplying knowledge to machine learning systems built to automate text processing tasks. In this chapter, we discuss how linguistic annotation is used in machine learning for different natural language processing (NLP) tasks. Specifically, we focus on how different layers of annotation are leveraged in tasks that aim to discover higher-level linguistic information. We present how machine learning fits into the annotation process in the MATTER cycle, discuss some common machine learning algorithms used in NLP, explain the fundamentals of feature selection, and explore methods for leveraging limited quantities of annotated data. We close with a case study of the 2012 i2b2 NLP shared task which targeted temporal information extraction, a higher-level task that requires a synthesis of information from multiple linguistic levels. Keywords Machine learning Natural language processing Annotation This is a preview of subscription content, log in to check access. Appendix: Machine Learning Resources and Toolkits For more information on the inner workings of ML algorithms, we highly recommend the following books: Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall. 2009. Christopher Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999 Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2013. A variety of toolkits are available for building ML systems. These toolkits provide implementations of different ML algorithms, thereby allowing NLP researchers to focus on providing the appropriate feature sets to maximize the accuracy of the results of the ML system. Many machine-learning systems for NLP are free and open source; here is a short list of commonly used ML toolkits and other systems: NLTK: http://www.nltk.org/ GATE: http://gate.ac.uk/ WEKA: http://www.cs.waikato.ac.nz/ml/weka/ LingPipe: http://alias-i.com/lingpipe/index.html MALLET: http://mallet.cs.umass.edu/ Stanford NLP tools: http://nlp.stanford.edu/software/index.shtml The NLTK also has an accompanying book: “Natural Language Processing with Python” by Steven Bird, Ewan Klein, and Edward Loper [4]. In addition to providing implementations of many machine learning algorithms that the user can train for their own specific tasks, many of these toolkits provide already-trained systems for common NLP tasks such as part-of-speech tagging, named entity recognition, dependency trees, and so on. This additional functionality is extremely important for many NLP tasks.

Cite

CITATION STYLE

APA

Rumshisky, A., & Stubbs, A. (2017). Machine Learning for Higher-Level Linguistic Tasks. In Handbook of Linguistic Annotation (pp. 333–351). Springer Netherlands. https://doi.org/10.1007/978-94-024-0881-2_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free