Lossless text compression using GPT-2 language model and Huffman coding

Md. Atiqur Rahman; Mohamed Hamada

Journal ArticleOPEN ACCESS

Lossless text compression using GPT-2 language model and Huffman coding

Rahman M
Hamada M

SHS Web of Conferences (2021) 102 04013

DOI: 10.1051/shsconf/202110204013

N/ACitations

9Readers

Abstract

Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods.

Cite

CITATION STYLE

APA

Rahman, Md. A., & Hamada, M. (2021). Lossless text compression using GPT-2 language model and Huffman coding. SHS Web of Conferences, 102, 04013. https://doi.org/10.1051/shsconf/202110204013

Lossless text compression using GPT-2 language model and Huffman coding

Abstract

Cite

Register to see more suggestions