Information Extraction of Multiple Categories from Pathology Reports

  • Li Y
  • Martinez D
  • 7


    Mendeley users who have this article in their library.
  • N/A


    Citations of this article.


Pathology reports are used to store information about cells and tissues of a patient, and they are crucial to monitor the health of individuals and population groups. In this work we present an evaluation of supervised text classification models for the prediction of relevant categories in pathology reports. Our aim is to integrate automatic classifiers to improve the current workflow of medical experts, and we implement and evaluate different machine learning approaches for a large number of categories. Our results show that we are able to predict nominal categories with high average f-score (81.3%), and we can improve over the majority class baseline by relying on Naive Bayes and feature selection. We also find that the classification of numeric categories is harder, and deeper analysis would be required to predict these labels.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

There are no full text links


  • Yue Li

  • David Martinez

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free