In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and other multimedia files available via Internet and the number is still rising. But considering the impressive variety of the Web, retrieving interesting contents has become a very difficult task. Web Content Mining uses the ideas and principles of data mining and knowledge discovery to screen more specific data. The use of the Web as a provider of information is unfortunately more complex than working with static databases. Because of its very dynamic nature and its vast number of documents, there is a need for new solutions that are not depending on accessing the complete data on the outset. Another important aspect is the presentation of query results. Due to its enormous size, a Web query can retrieve thousands of resulting Web pages. Thus meaningful methods for presenting these large results are necessary to help a user to select the most interesting content. In this chapter we will discuss several basic topics of Web document representation, Web search, short text processing, topic extraction and Web opinion mining.
CITATION STYLE
Xu, G., Zhang, Y., & Li, L. (2011). Web Content Mining. In Web Mining and Social Networking (pp. 71–87). Springer US. https://doi.org/10.1007/978-1-4419-7735-9_4
Mendeley helps you to discover research relevant for your work.