Why and when should you pool? Analyzing pooling in recurrent architectures

Pratyush Maini; Keshav Kolluru; Danish Pruthi; undefined Mausam

Conference ProceedingsOPEN ACCESS

Why and when should you pool? Analyzing pooling in recurrent architectures

Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (2020) 4568-4586

DOI: 10.18653/v1/2020.findings-emnlp.410

2Citations

65Readers

Abstract

Pooling-based recurrent neural architectures consistently outperform their counterparts without pooling on sequence classification tasks. However, the reasons for their enhanced performance are largely unexamined. In this work, we explore three commonly used pooling techniques (mean-pooling, max-pooling, and attention1), and propose max-attention, a novel variant that captures interactions among predictive tokens in a sentence. Using novel experiments, we demonstrate that pooling architectures substantially differ from their non-pooling equivalents in their learning ability and positional biases: (i) pooling facilitates better gradient flow than BiLSTMs in initial training epochs, and (ii) BiLSTMs are biased towards tokens at the beginning and end of the input, whereas pooling alleviates this bias. Consequently, we find that pooling yields large gains in low resource scenarios, and instances when salient words lie towards the middle of the input. Across several text classification tasks, we find max-attention to frequently outperform other pooling techniques.2

Cite

CITATION STYLE

APA

Maini, P., Kolluru, K., Pruthi, D., & Mausam. (2020). Why and when should you pool? Analyzing pooling in recurrent architectures. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 4568–4586). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.410

Why and when should you pool? Analyzing pooling in recurrent architectures

Abstract

Cite

Register to see more suggestions