Mining structured objects (data records) based on maximum region detection by text content comparison from website
- ISSN: 20771185
At present, a great amount of information on the Web is presented in regularly structured objects. These are known as data records. A list of such objects in a Web page often describes a list of similar items; such as, a list of products to provide their value- added services. Therefore, it has become increasingly necessary to develop an effective process for extracting information from them. In this paper, we present a more effective method to perform the task. The proposed method is able to mine data records not only from a single Web page but also from an entire Web site. The performance of the proposed method is evaluated in respect to the previous methods in the literature. Our experimental results show that the proposed technique outperforms the existing techniques eventually. Finally, we compare the results of the experiments and discuss about the performance of the proposed method in the case of mining structured objects (data records).