LISC: A Python Package for Scientific Literature Collection and Analysis

  • Donoghue T
N/ACitations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

The scientific literature is vast, and ever expanding. For example, the Pubmed database, a curated database of literature from the bio-medical sciences, holds more than 30 million published scientific articles, and is continuously growing. Given the scale of the literature, work across informatics, information sciences, and bibliometrics has explored automated methods for the curation of and inference from the existing literature. This work is sometimes referred to as knowledge discovery, literature-based discovery, or hypothesis generation (Spangler et al., 2014; Stegmann & Grohmann, 2003; J. B. Voytek & Voytek, 2012). Here, we introduce ‘Literature Scanner’, or ‘LISC’, an open-source Python module for performing automated meta-analyses of scientific articles by collecting and analyzing data from the scientific literature. LISC seeks to provide an easily accessible interface that connects to external resources that make data available through application programming interfaces (APIs). For example, LISC connects to the Pubmed database, providing access to collect and analyze biomedical literature, and to the OpenCitations database (Heibi, Peroni, & Shotton, 2019) providing access to citation data. LISC is designed with an extendable approach that can be used to integrate additional APIs. LISC also includes support and utilities for analyzing the collected literature data. For data collection, LISC currently offers the following types of literature data collection: • Counts: tools to collect and analyze data on the co-occurrence of specified search terms • Words: tools to collect and analyze text and meta-data from scientific articles • Citations: tools to collect and analyze citation and reference data To support use cases for collection and analyzing literature data, LISC includes: • URL management and requesting for interacting with integrated APIs • custom data objects for managing collected data • a database structure, as well as save and load utilities for storing collected data • functions and utilities to analyze collected data • data visualization for plotting collected data and analysis outputs LISC is organized as an object-oriented tool, and aims to be a general utility that can be expanded to included new databases, APIs, and analyses as new resources and tools are integrated.

Cite

CITATION STYLE

APA

Donoghue, T. (2019). LISC: A Python Package for Scientific Literature Collection and Analysis. Journal of Open Source Software, 4(41), 1674. https://doi.org/10.21105/joss.01674

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free