HORNET: Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources

Taolin Zhang; Zerui Cai; Chengyu Wang; Peng Li; Yang Li; Minghui Qiu; Chengguang Tang; Xiaofeng He; Jun Huang

Conference ProceedingsOPEN ACCESS

HORNET: Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources

International Conference on Information and Knowledge Management, Proceedings (2021) 2608-2617

DOI: 10.1145/3459637.3482436

7Citations

12Readers

Get full text

Abstract

Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the language understanding abilities of deep language models by leveraging the rich semantic knowledge from knowledge graphs, other than plain pre-training texts. However, previous efforts mostly use homogeneous knowledge (especially structured relation triples in knowledge graphs) to enhance the context-aware representations of entity mentions, whose performance may be limited by the coverage of knowledge graphs. Also, it is unclear whether these KEPLMs truly understand the injected semantic knowledge due to the "black-box'' training mechanism. In this paper, we propose a novel KEPLM named HORNET, which integrates Heterogeneous knowledge from various structured and unstructured sources into the Roberta NETwork and hence takes full advantage of both linguistic and factual knowledge simultaneously. Specifically, we design a hybrid attention heterogeneous graph convolution network (HaHGCN) to learn heterogeneous knowledge representations based on the structured relation triplets from knowledge graphs and the unstructured entity description texts. Meanwhile, we propose the explicit dual knowledge understanding tasks to help induce a more effective infusion of the heterogeneous knowledge, promoting our model for learning the complicated mappings from the knowledge graph embedding space to the deep context-aware embedding space and vice versa. Experiments show that our HORNET model outperforms various KEPLM baselines on knowledge-aware tasks including knowledge probing, entity typing and relation extraction. Our model also achieves substantial improvement over several GLUE benchmark datasets, compared to other KEPLMs.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, T., Cai, Z., Wang, C., Li, P., Li, Y., Qiu, M., … Huang, J. (2021). HORNET: Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources. In International Conference on Information and Knowledge Management, Proceedings (pp. 2608–2617). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482436

HORNET: Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources

Abstract

Author supplied keywords

Cite

Register to see more suggestions