The literature provides a variety of techniques to build the information extractors on which some data integration systems rely. Information extraction techniques are usually based on extraction rules that require maintenance and adaptation if web sources change. We present our preliminary steps towards an unsupervised information extraction technique that searches web documents for shared patterns and fragments them until finding the relevant information that should be extracted. Experimental results on 1230 real-web documents demonstrate that our system performs fast and achieves promising results. © 2012 Springer-Verlag.
CITATION STYLE
Sleiman, H. A., & Corchuelo, R. (2012). Towards a method for unsupervised web information extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7387 LNCS, pp. 427–430). https://doi.org/10.1007/978-3-642-31753-8_36
Mendeley helps you to discover research relevant for your work.