In recent years, research on big data, data storage and other topics that represent innovations in the analytics field has become very popular. This paper describes a proposal of a big web data application and archive for the distributed data processing with Apache Hadoop, including the framework with selected methods, which can be used with this platform. It proposes a workflow to create a web content mining application and a big data archive, which uses modern technologies like Python, PHP, JavaScript, MySQL and cloud services. It also shows the overview about the architecture, methods and data structures used in the context of web mining, distributed processing and big data analytics.
CITATION STYLE
Lnenicka, M., Hovad, J., & Komarkova, J. (2015). A proposal of a big web data application and archive for the distributed data processing with Apache Hadoop. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9330 LNCS, pp. 285–294). Springer Verlag. https://doi.org/10.1007/978-3-319-24306-1_28
Mendeley helps you to discover research relevant for your work.