Construction of a Blog Corpus with Syntactic, Anaphoric, and Sentiment Annotations

Chikara Hashimoto; Sadao Kurohashi; Daisuke Kawahara; Keiji Shinzato; Masaaki Nagata

Journal ArticleOPEN ACCESS

Construction of a Blog Corpus with Syntactic, Anaphoric, and Sentiment Annotations

Hashimoto C
Kurohashi S
Kawahara D
et al.

Journal of Natural Language Processing (2011) 18(2) 175-201

DOI: 10.5715/jnlp.18.175

N/ACitations

10Readers

Abstract

There has been a growing interest in the technologies of information access and analysis targetting blog articles recently. In order to provide the research community with the basic data, we constructed a blog corpus that consists of 249 articles (4,186 sentences) and has the following features: i) Annotated with sentence boundaries. ii) Annotated with grammatical information about morphology, dependency, case, anaphora, and named entities, in a way consistent with Kyoto University Text Corpus. iii) Annotated with sentiment information. iv) Provided with HTML files that visualize all the annotations above. We asked 81 university students to write blog articles about either the sightseeing of Kyoto, cellphones, sports, or gourmet. In constructing the annotated blog corpus, we faced problems concerning sentence boundaries, parentheses, errata, dialect, a variety of smiley, and other morphological variations. In this paper, we describe the specification of the corpus and how we dealt with the above problems.

Cite

CITATION STYLE

APA

Hashimoto, C., Kurohashi, S., Kawahara, D., Shinzato, K., & Nagata, M. (2011). Construction of a Blog Corpus with Syntactic, Anaphoric, and Sentiment Annotations. Journal of Natural Language Processing, 18(2), 175–201. https://doi.org/10.5715/jnlp.18.175

Construction of a Blog Corpus with Syntactic, Anaphoric, and Sentiment Annotations

Abstract

Cite

Register to see more suggestions