Enhancement of Markov Chain-Based Linguistic Steganography with Binary Encoding for Securing Legal Documents
Keywords:
security, steganography, markov chainAbstract
The Markov Chain-based linguistic steganography algorithm can effectively hide information within human-like cover text, but it is highly limited in processing speed. A traditional implementation relying on Huffman tree-based encoding mainly suffers from slow processing due to the computational overhead of building the tree itself. To address this issue, this study proposes an enhanced algorithm using binary indexing for constant time complexity. The results were experimentally calculated using models of varying state sizes derived from the same text corpus as a control variable. Perplexity analysis was also employed to evaluate imperceptibility and ensure there were no drawbacks to the cover media’s integrity. The results indicate that the enhanced algorithm improves processing speed by up to 54 times across all state sizes without compromising imperceptibility. This establishes that the enhancements yielded a significantly faster processing speed for the existing algorithm while remaining secure in its concealment. In practice, the algorithm was applied in legal document storage to strengthen its security.
References
Chang, C.-Y., & Clark, S. (2014). Practical Linguistic Steganography Using Contextual Synonym Substitution and a Novel Vertex Coding Method. Computational Linguistics, 40(2), 403–448. https://doi.org/10.1162/coli_a_00176
Healey, A. (2021, January 31). Generating Text With Markov Chains. Retrieved March 22, 2024, from healeycodes.com website: https://healeycodes.com/generating-text-with-markov-chains
Lockwood, R., & Curran, K. (2017). Text based steganography. International Journal of Information Privacy, Security and Integrity, 3(2), 134. https://doi.org/10.1504/ijipsi.2017.088700
Lukas Görög. (2023, March 21). Exploring the Latest Advancements in GPT-4: A Comprehensive Overview - Predictea Digital Care - AI Strategy, Predictions, Data Analysis. Retrieved October 27, 2024, from Predictea Digital Care - AI Strategy, Predictions, Data Analysis website: https://www.predictea.com/exploring-the-latest-advancements-in-gpt-4-a-comprehensive-overview/
Madison, J., & Dickman, S. (2007). An Overview of Steganography An Overview of Steganography. Retrieved from https://digitnet.github.io/m4jpeg/downloads/pdf/an-overview-of-steganography.pdf
Mishra, R., & Bhanodiya, P. (2015, March 1). A review on steganography and cryptography. https://doi.org/10.1109/ICACEA.2015.7164679
Moraldo, H. (2024, March 11). An Approach for Text Steganography Based on Markov Chains. Retrieved April 26, 2024, from ar5iv website: https://ar5iv.labs.arxiv.org/html/1409.0915v1
Mulunda, C. K., Wagacha, P. W., & Adede, A. O. (2013). Genetic Algorithm Based Model in Text Steganography. Erepository.uonbi.ac.ke. Retrieved from http://erepository.uonbi.ac.ke/handle/11295/81186
Parrish, A. (2014, January 14). decontextualize · N-grams and Markov chains. Retrieved from decontextualize website: https://www.decontextualize.com/teaching/rwet/n-grams-and-markov-chains/
Payong, A. (2024, June 10). Baeldung. Retrieved October 27, 2024, from Baeldung on Computer Science website: https://www.baeldung.com/cs/language-models-perplexity
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. Retrieved from https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
Roy, S., & Manasmita, M. (2011). A novel approach to format based text steganography. Proceedings of the 2011 International Conference on Communication, Computing & Security - ICCCS ’11. https://doi.org/10.1145/1947940.1948046
Stephen, W. (2023, July 4). Perplexity in AI and NLP — Klu. Retrieved from klu.ai website: https://klu.ai/glossary/perplexity
Umut Topkara, Mercan Topkara, & Atallah, M. J. (2006). The hiding virtues of ambiguity. ACM Workshop on Multimedia and Security. https://doi.org/10.1145/1161366.1161397
Wayner, P. (1992). MIMIC FUNCTIONS. Cryptologia, 16(3), 193–214. https://doi.org/10.1080/0161-119291866883
Xiang, L., Yang, S., Liu, Y., Li, Q., & Zhu, C. (2020). Novel Linguistic Steganography Based on Character-Level Text Generation. Mathematics, 8(9), 1558. https://doi.org/10.3390/math8091558
Yang, Z., Jin, S., Huang, Y., Zhang, Y., & Li, H. (2018). Automatically Generate Steganographic Text Based on Markov Model and Huffman Coding. ArXiv (Cornell University).