GeoBoost2: A natural languageprocessing pipeline for GenBank metadata enrichment for virus phylogeography

8Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We present GeoBoost2, a natural language-processing pipeline for extracting the location of infected hosts for enriching metadata in nucleotide sequences repositories like National Center of Biotechnology Information's GenBank for downstream analysis including phylogeography and genomic epidemiology. The increasing number of pathogen sequences requires complementary information extraction methods for focused research, including surveillance within countries and between borders. In this article, we describe the enhancements from our earlier release including improvement in end-to-end extraction performance and speed, availability of a fully functional web-interface and state-of-the-art methods for location extraction using deep learning.

Cite

CITATION STYLE

APA

Magge, A., Weissenbacher, D., O’Connor, K., Tahsin, T., Gonzalez-Hernandez, G., & Scotch, M. (2020). GeoBoost2: A natural languageprocessing pipeline for GenBank metadata enrichment for virus phylogeography. Bioinformatics, 36(20), 5120–5121. https://doi.org/10.1093/bioinformatics/btaa647

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free