Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods.
CITATION STYLE
Rahman, Md. A., & Hamada, M. (2021). Lossless text compression using GPT-2 language model and Huffman coding. SHS Web of Conferences, 102, 04013. https://doi.org/10.1051/shsconf/202110204013
Mendeley helps you to discover research relevant for your work.