Enriching domain-specific language models using domain independent WWW n-gram corpus

Harry Chang

Conference Proceedings

Enriching domain-specific language models using domain independent WWW n-gram corpus

Chang H

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7268 LNAI(PART 2) 38-46

DOI: 10.1007/978-3-642-29350-4_5

1Citations

10Readers

Get full text

Abstract

This paper describes the new techniques developed to extract and compute the domain-specific knowledge implicitly embedded in a highly structural ontology-based information system for TV Electronic Programming Guide (EPG). The domain knowledge represented by a set of mutually related n-gram data sets is then enriched by exploring the explicit structural dependencies and implicit semantic association between the data entities in the domain and the domain-independent texts from the Google 1 trillion 5-grams corpus created from general WWW documents. The knowledge-based enrichment process creates the language models required for a natural language based EPG search system that outperform the baseline model created only from the original EPG data source by a significant margin measured by an absolute improvement of 14.1% on the model coverage (recall accuracy) using large-scale test data collected from a real-world EPG search application. © 2012 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Chang, H. (2012). Enriching domain-specific language models using domain independent WWW n-gram corpus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7268 LNAI, pp. 38–46). Springer Verlag. https://doi.org/10.1007/978-3-642-29350-4_5

Enriching domain-specific language models using domain independent WWW n-gram corpus

Abstract

Author supplied keywords

Cite

Register to see more suggestions