OBJECTIVES: Besides keyword search, navigational search is an important means to find relevant information in digital object collections. Such navigation is often supported by categorization systems or thesauri, which provide a hierarchical view on a particular domain and allow for browsing digital collections. Existing categorization systems, however, require large and expensive efforts for the manual creation and maintenance. Our Semantic GrowBag algorithm fully automatically creates concept graphs, i.e. directed graphs similar to categorization systems but without strong subsumption semantics. This article sketches our algorithm and evaluates it for the medical domain.
METHODS: Our Semantic GrowBag algorithm uses descriptive keywords and exploits higher-order co-occurrences between them to create concept graphs (so-called GrowBag graphs) from annotated object collections. In this study, we have automatically created more than 2000 GrowBag graphs based on the Medline data set to show the applicability of our algorithm in the medical domain. For the evaluation, we first compared our algorithm to a baseline algorithm that does not take higher-order co-occurrences into account, and then compared the resulting GrowBag graphs systematically against the manually crafted MeSH thesaurus.
RESULTS: Our experiments revealed that the Semantic GrowBag approach essentially increases the number of relevant relationships in comparison to a baseline approach by about 50%. Furthermore, the identified relations usually correspond to and hardly ever contradict to relationships as stated by MeSH.
CONCLUSIONS: The Semantic GrowBag algorithm allows creating concept graphs fully automatically. While it does not systematically exploit specifics of a domain (such as the fundamental separation between 'drugs' and 'therapy' in MeSH), the resulting GrowBag graphs are nevertheless well-suited to support navigation in digital object collections. Moreover, they can also be used to help maintaining existing categorization systems based on the actual usage of categories.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below