Universal web pages content parser

5Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This article describes the universal web pages content parser - cross-platform application enhancing the process of data extraction from the web pages. In this implementation user friendly interface, possibility of significant automation and reusability of already created patterns had been the key elements. Moreover, the original approach to the issue of parsing the not well-formed HTML, stating the application's core, is precisely presented. Universal web pages content parser shows that the simplified web scrapping utility may be available to masses and not well-formed HTML sources may feed useful tree-like data structures as well as the well-formed ones. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Pawlas, P., Domański, A., & Domańska, J. (2012). Universal web pages content parser. In Communications in Computer and Information Science (Vol. 291 CCIS, pp. 130–138). https://doi.org/10.1007/978-3-642-31217-5_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free