Enriching domain-specific language models using domain independent WWW n-gram corpus

1Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper describes the new techniques developed to extract and compute the domain-specific knowledge implicitly embedded in a highly structural ontology-based information system for TV Electronic Programming Guide (EPG). The domain knowledge represented by a set of mutually related n-gram data sets is then enriched by exploring the explicit structural dependencies and implicit semantic association between the data entities in the domain and the domain-independent texts from the Google 1 trillion 5-grams corpus created from general WWW documents. The knowledge-based enrichment process creates the language models required for a natural language based EPG search system that outperform the baseline model created only from the original EPG data source by a significant margin measured by an absolute improvement of 14.1% on the model coverage (recall accuracy) using large-scale test data collected from a real-world EPG search application. © 2012 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Chang, H. (2012). Enriching domain-specific language models using domain independent WWW n-gram corpus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7268 LNAI, pp. 38–46). Springer Verlag. https://doi.org/10.1007/978-3-642-29350-4_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free