Abstract
The Hazardous Substances Data Bank (HSDB), a factual data file produced and maintained by the Specialized Information Services (SIS) Division of the National Library of Medicine (NLM), contains over 4600 records on potentially hazardous chemicals. To improve information retrieval from HSDB, SIS has undertaken the development of an automated indexing protocol in collaboration with NLM's Indexing Initiative group. The Indexing Initiative investigates methods whereby automated indexing may partially or completely substitute for human indexing. Three main methodologies are applied: the MetaMap Indexing method, which maps text to concepts in the Unified Medical Language System (UMLS) Metathesaurus; the Trigram Phrase Matching method, which uses character trigrams to match text to Metathesaurus concepts; and a variant of the PubMed Related Citations method to find MeSH terms related to input text. The UMLS concepts generated by the first two methods are mapped to MeSH main headings through the Restrict-to-MeSH algorithm. The resulting MeSH terms are then clustered into a ranked list of recommended indexing terms. The purpose of the poster is to present our experience in applying these automated indexing methodologies to a large data file with highly structured records, a variety of text and data formats, and complex technical and biomedical terminology.
Cite
CITATION STYLE
Nuss, C., Chang, H. F., Moore, D., & Fonger, G. C. (2003). Automated indexing of the Hazardous Substances Data Bank. In Proceedings of the ASIST Annual Meeting (Vol. 40, pp. 537–539). John Wiley and Sons Inc. https://doi.org/10.1002/meet.14504001112
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.