A large-scale Web data collection as a natural language processing infrastructure

6Citations
Citations of this article
150Readers
Mendeley users who have this article in their library.

Abstract

In recent years, language resources acquired from the Web are released, and these data improve the performance of applications in several NLP tasks. Although the language resources based on the web page unit are useful in NLP tasks and applications such as knowledge acquisition, document retrieval and document summarization, such language resources are not released so far. In this paper, we propose a data format for results of web page processing, and a search engine infrastructure which makes it possible to share approximately 100 million Japanese web data. By obtaining the web data, NLP researchers are enabled to begin their own processing immediately without analyzing web pages by themselves.

Cite

CITATION STYLE

APA

Shinzato, K., Kawahara, D., Hashimoto, C., & Kurohashi, S. (2008). A large-scale Web data collection as a natural language processing infrastructure. In Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (pp. 2236–2241). European Language Resources Association (ELRA).

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free