Enhancement of Markov Chain-Based Linguistic Steganography with Binary Encoding for Securing Legal Documents

Authors

  • Halili A B Pamantasan ng Lungsod ng Maynila
  • Salangsang G A Pamantasan ng Lungsod ng Maynila

Keywords:

security, steganography, markov chain

Abstract

The Markov Chain-based linguistic steganography algorithm can effectively hide information within human-like cover text, but it is highly limited in processing speed. A traditional implementation relying on Huffman tree-based encoding mainly suffers from slow processing due to the computational overhead of building the tree itself. To address this issue, this study proposes an enhanced algorithm using binary indexing for constant time complexity. The results were experimentally calculated using models of varying state sizes derived from the same text corpus as a control variable. Perplexity analysis was also employed to evaluate imperceptibility and ensure there were no drawbacks to the cover media’s integrity. The results indicate that the enhanced algorithm improves processing speed by up to 54 times across all state sizes without compromising imperceptibility. This establishes that the enhancements yielded a significantly faster processing speed for the existing algorithm while remaining secure in its concealment.  In practice, the algorithm was applied in legal document storage to strengthen its security.

References

Chang, C.-Y., & Clark, S. (2014). Practical Linguistic Steganography Using Contextual Synonym Substitution and a Novel Vertex Coding Method. Computational Linguistics, 40(2), 403–448. https://doi.org/10.1162/coli_a_00176

Healey, A. (2021, January 31). Generating Text With Markov Chains. Retrieved March 22, 2024, from healeycodes.com website: https://healeycodes.com/generating-text-with-markov-chains

Lockwood, R., & Curran, K. (2017). Text based steganography. International Journal of Information Privacy, Security and Integrity, 3(2), 134. https://doi.org/10.1504/ijipsi.2017.088700

Lukas Görög. (2023, March 21). Exploring the Latest Advancements in GPT-4: A Comprehensive Overview - Predictea Digital Care - AI Strategy, Predictions, Data Analysis. Retrieved October 27, 2024, from Predictea Digital Care - AI Strategy, Predictions, Data Analysis website: https://www.predictea.com/exploring-the-latest-advancements-in-gpt-4-a-comprehensive-overview/

Madison, J., & Dickman, S. (2007). An Overview of Steganography An Overview of Steganography. Retrieved from https://digitnet.github.io/m4jpeg/downloads/pdf/an-overview-of-steganography.pdf

Mishra, R., & Bhanodiya, P. (2015, March 1). A review on steganography and cryptography. https://doi.org/10.1109/ICACEA.2015.7164679

Moraldo, H. (2024, March 11). An Approach for Text Steganography Based on Markov Chains. Retrieved April 26, 2024, from ar5iv website: https://ar5iv.labs.arxiv.org/html/1409.0915v1

Mulunda, C. K., Wagacha, P. W., & Adede, A. O. (2013). Genetic Algorithm Based Model in Text Steganography. Erepository.uonbi.ac.ke. Retrieved from http://erepository.uonbi.ac.ke/handle/11295/81186

Parrish, A. (2014, January 14). decontextualize · N-grams and Markov chains. Retrieved from decontextualize website: https://www.decontextualize.com/teaching/rwet/n-grams-and-markov-chains/

Payong, A. (2024, June 10). Baeldung. Retrieved October 27, 2024, from Baeldung on Computer Science website: https://www.baeldung.com/cs/language-models-perplexity

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. Retrieved from https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

Roy, S., & Manasmita, M. (2011). A novel approach to format based text steganography. Proceedings of the 2011 International Conference on Communication, Computing & Security - ICCCS ’11. https://doi.org/10.1145/1947940.1948046

Stephen, W. (2023, July 4). Perplexity in AI and NLP — Klu. Retrieved from klu.ai website: https://klu.ai/glossary/perplexity

Umut Topkara, Mercan Topkara, & Atallah, M. J. (2006). The hiding virtues of ambiguity. ACM Workshop on Multimedia and Security. https://doi.org/10.1145/1161366.1161397

Wayner, P. (1992). MIMIC FUNCTIONS. Cryptologia, 16(3), 193–214. https://doi.org/10.1080/0161-119291866883

Xiang, L., Yang, S., Liu, Y., Li, Q., & Zhu, C. (2020). Novel Linguistic Steganography Based on Character-Level Text Generation. Mathematics, 8(9), 1558. https://doi.org/10.3390/math8091558

Yang, Z., Jin, S., Huang, Y., Zhang, Y., & Li, H. (2018). Automatically Generate Steganographic Text Based on Markov Model and Huffman Coding. ArXiv (Cornell University).

Downloads

Published

2025-01-07