Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection

Xin Huang; Rene Bidart; Ashish Khetan; Zohar Karnin

Conference ProceedingsOPEN ACCESS

Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2022) 1 8798-8817

DOI: 10.18653/v1/2022.acl-long.602

5Citations

50Readers

Abstract

Transformer-based language models such as BERT (Devlin et al., 2018) have achieved the state-of-the-art performance on various NLP tasks, but are computationally prohibitive. A recent line of works use various heuristics to successively shorten sequence length while transforming tokens through encoders, in tasks such as classification and ranking that require a single token embedding for prediction. We present a novel solution to this problem, called Pyramid-BERT where we replace previously used heuristics with a coreset based token selection method justified by theoretical results. The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths. We provide extensive experiments establishing advantages of pyramid BERT over several baselines and existing works on the GLUE benchmarks and Long Range Arena (Tay et al., 2020) datasets.

Cite

CITATION STYLE

APA

Huang, X., Bidart, R., Khetan, A., & Karnin, Z. (2022). Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 8798–8817). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.602

Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection

Abstract

Cite

Register to see more suggestions