Construction of a Blog Corpus with Syntactic, Anaphoric, and Sentiment Annotations

  • Hashimoto C
  • Kurohashi S
  • Kawahara D
  • et al.
N/ACitations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

There has been a growing interest in the technologies of information access and analysis targetting blog articles recently. In order to provide the research community with the basic data, we constructed a blog corpus that consists of 249 articles (4,186 sentences) and has the following features: i) Annotated with sentence boundaries. ii) Annotated with grammatical information about morphology, dependency, case, anaphora, and named entities, in a way consistent with Kyoto University Text Corpus. iii) Annotated with sentiment information. iv) Provided with HTML files that visualize all the annotations above. We asked 81 university students to write blog articles about either the sightseeing of Kyoto, cellphones, sports, or gourmet. In constructing the annotated blog corpus, we faced problems concerning sentence boundaries, parentheses, errata, dialect, a variety of smiley, and other morphological variations. In this paper, we describe the specification of the corpus and how we dealt with the above problems.

Cite

CITATION STYLE

APA

Hashimoto, C., Kurohashi, S., Kawahara, D., Shinzato, K., & Nagata, M. (2011). Construction of a Blog Corpus with Syntactic, Anaphoric, and Sentiment Annotations. Journal of Natural Language Processing, 18(2), 175–201. https://doi.org/10.5715/jnlp.18.175

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free