Lossless text compression using GPT-2 language model and Huffman coding

  • Rahman M
  • Hamada M
N/ACitations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods.

Cite

CITATION STYLE

APA

Rahman, Md. A., & Hamada, M. (2021). Lossless text compression using GPT-2 language model and Huffman coding. SHS Web of Conferences, 102, 04013. https://doi.org/10.1051/shsconf/202110204013

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free