Based on the previous fundamentals and findings, in this chapter, hierarchic and centroid-based document management algorithms will now be presented, which are inspired by the way of how human librarians classify, sort and catalogue incoming documents. For this purpose, they calculate the distance of centroid terms in a local co-occurrence graph as a metric to determine the documents’ semantic closeness, to generate (sub)clusters of documents and to assign them to processing nodes (child nodes which are created for this purpose on-the-fly). These centroid-based library management and clustering algorithms are designed to run decentrally on peers (the librarians) of a P2P-network. Furthermore, this approach is equally used to classify and answer incoming queries as well as to route and forward them to semantically matching child nodes.
CITATION STYLE
Kubek, M. (2020). Centroid-Based Library Management and Document Clustering. In Studies in Big Data (Vol. 62, pp. 103–116). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-23136-1_7
Mendeley helps you to discover research relevant for your work.