HORNET: Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources

7Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the language understanding abilities of deep language models by leveraging the rich semantic knowledge from knowledge graphs, other than plain pre-training texts. However, previous efforts mostly use homogeneous knowledge (especially structured relation triples in knowledge graphs) to enhance the context-aware representations of entity mentions, whose performance may be limited by the coverage of knowledge graphs. Also, it is unclear whether these KEPLMs truly understand the injected semantic knowledge due to the "black-box'' training mechanism. In this paper, we propose a novel KEPLM named HORNET, which integrates Heterogeneous knowledge from various structured and unstructured sources into the Roberta NETwork and hence takes full advantage of both linguistic and factual knowledge simultaneously. Specifically, we design a hybrid attention heterogeneous graph convolution network (HaHGCN) to learn heterogeneous knowledge representations based on the structured relation triplets from knowledge graphs and the unstructured entity description texts. Meanwhile, we propose the explicit dual knowledge understanding tasks to help induce a more effective infusion of the heterogeneous knowledge, promoting our model for learning the complicated mappings from the knowledge graph embedding space to the deep context-aware embedding space and vice versa. Experiments show that our HORNET model outperforms various KEPLM baselines on knowledge-aware tasks including knowledge probing, entity typing and relation extraction. Our model also achieves substantial improvement over several GLUE benchmark datasets, compared to other KEPLMs.

Cite

CITATION STYLE

APA

Zhang, T., Cai, Z., Wang, C., Li, P., Li, Y., Qiu, M., … Huang, J. (2021). HORNET: Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources. In International Conference on Information and Knowledge Management, Proceedings (pp. 2608–2617). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482436

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free