Similarity of semantic content of web pages is displayed using interactive graphs presenting fragments of minimum spanning trees. Homepages of people are analyzed, parsed into XML documents and visualized using TouchGraph LinkBrowser, displaying clusters of people that share common interest. The structure of these graphs is strongly affected by selection of information used to calculate similarity. Influence of simple selection and Latent Semantic Analysis (LSA) on structures of such graphs is analyzed. Homepages and lists of publications are converted to a word frequency vector, filtered, weighted and similarity matrix between normalized vectors is used to create separate minimum sub-trees showing clustering of people’s interest. Results show that in this application simple selection of important keywords is as good as LSA but with much lower algorithmic complexity.
CITATION STYLE
Duch, W., & Matykiewicz, P. (2006). Minimum Spanning Trees Displaying Semantic Similarity. In Intelligent Information Processing and Web Mining (pp. 31–40). Springer-Verlag. https://doi.org/10.1007/3-540-32392-9_4
Mendeley helps you to discover research relevant for your work.