TV-gram language models for massively parallel devices

Nikolay Bogoychev; Adam Lopez

Conference ProceedingsOPEN ACCESS

TV-gram language models for massively parallel devices

54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers (2016) 4 1944-1953

DOI: 10.18653/v1/p16-1183

2Citations

93Readers

Abstract

For many applications, the query speed of N-gram language models is a computational bottleneck. Although massively parallel hardware like GPUs offer a potential solution to this bottleneck, exploiting this hardware requires a careful rethinking of basic algorithms and data structures. We present the first language model designed for such hardware, using B-trees to maximize data parallelism and minimize memory footprint and latency. Compared with a single-threaded instance of KenLM (Heafield, 2011), a highly optimized CPU-based language model, our GPU implementation produces identical results with a smaller memory footprint and a sixfold increase in throughput on a batch query task. When we saturate both devices, the GPU delivers nearly twice the throughput per hardware dollar even when the CPU implementation uses faster data structures.

Cite

CITATION STYLE

APA

Bogoychev, N., & Lopez, A. (2016). TV-gram language models for massively parallel devices. In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers (Vol. 4, pp. 1944–1953). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p16-1183

TV-gram language models for massively parallel devices

Abstract

Cite

Register to see more suggestions