Specimen Data Refinery: A landscape analysis on machine learning, computer vision and automated approaches to capture specimen metadata

Laurence Livermore; Robert Cubey

Journal ArticleOPEN ACCESS

Specimen Data Refinery: A landscape analysis on machine learning, computer vision and automated approaches to capture specimen metadata

Livermore L
Cubey R

Biodiversity Information Science and Standards (2019) 3

DOI: 10.3897/biss.3.37647

N/ACitations

5Readers

Abstract

Capturing data from specimen images is the most viable way of enriching specimen metadata cheaply and quickly compared to traditional digitisation. Advances in machine learning and computer vision-based tools, and their increasing accessibility and affordability, are greatly increasing the potential to take automated measurements and capture other data from specimens themselves, as well as to transcribe label data.More sophisticated segmentation of images allows us to find parts of interest: particular labels; individual specimens on a slide; or barcodes. Following segmentation, there is the potential to use colour analysis of specimens to perform conditional checking, such as looking for bad cases of verdigris in pinned insects or discoloration of gum-chloral mountant. Automating measurements and landmark analysis of specimens can be used to create trait datasets, all of which will enrich our knowledge of specimens. Segmentation of labels can allow us to cluster similar labels based on their visual properties including colour, shape and patterns—this in turn can be used to make optical character recognition, handwriting recognition and manual transcription much more efficient. Atomising, validating and resolving label data will create structured label data that can be more easily stored, searched and linked to other datasets.We present a landscape analysis on the approaches, summarising previous work, and outline our plan to build future tools and systems in the SYNTHESYS+ Project as part of the Specimen Data Refinery. This will cover the sharing of tools, reducing barriers to access, integrating workflow engines into a software architecture that allows the components to be re-used and re-purposed with provenance data for repeatability, and conforms with the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles (Wilkinson et al. 2016).

Cite

CITATION STYLE

APA

Livermore, L., & Cubey, R. (2019). Specimen Data Refinery: A landscape analysis on machine learning, computer vision and automated approaches to capture specimen metadata. Biodiversity Information Science and Standards, 3. https://doi.org/10.3897/biss.3.37647

Specimen Data Refinery: A landscape analysis on machine learning, computer vision and automated approaches to capture specimen metadata

Abstract

Cite

Register to see more suggestions