ECETR-extended content extraction via tag ratios

ISSN: 22773878
0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.

Abstract

The regular approach for the Common internet user to search the Contents of World Wide Web is through web query interfaces. Enormous use of the Internet to for the desired information around the world, the collection of important information from multiple web pages remains a difficult problem. There are multiple web content extraction systems are proposed to extract desired information from webpages. There are many number of manually constructed, supervised, semi supervised systems are developed in the field of web information extraction. There are many ways to extract the content from web pages are developed, such as document Object trees (DOM), Text Density, Tag Ratio proportion, visual information based algorithms. This paper proposes a novel web content extraction method on web content extraction uses Tag Ratios and added clustering methods. As our Proposed system is able to extract 85%-90% user relevant information.

Cite

CITATION STYLE

APA

Ashok Kumar, R., & Rama Devi, Y. (2019). ECETR-extended content extraction via tag ratios. International Journal of Recent Technology and Engineering, 7(6), 158–160.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free