An Enhancement of Optical Character Recognition (OCR) Algorithm Applied in Translating Signages to Filipino

Authors

  • Shieryl E. Tendilla Pamantasan ng Lungsod ng Maynila
  • Ivan Rey R. Dumago Pamantasan ng Lungsod ng Maynila
  • Francis Arlando L Atienza Pamantasan ng Lungsod ng Maynila
  • Dan Michael A Cortez Pamantasan ng Lungsod ng Maynila

Keywords:

Optical Character Recognition (OCR), Image Processing, Text Recognition, Computer Vision, Machine Learning

Abstract

Optical Character Recognition (OCR) systems often struggle to extract text accurately from images captured at various distances, particularly under challenging conditions such as blurriness, noise, or poor lighting. These issues are common in real-world scenarios and limit the effectiveness of existing OCR technologies. This study addresses these challenges by applying Gaussian blur after the grayscale conversion. This method reduces noise for the image's clarity without sacrificing the original algorithm's key features. Results revealed that the enhanced OCR algorithm significantly outperformed existing methods in terms of accuracy and confidence levels. It demonstrated the ability to read signages with higher precision, even in difficult conditions such as intricate designs, poor lighting, and long distances. This advancement enables more reliable text recognition and translation, offering practical applications for public signage translation, cross-cultural communication, and improved accessibility in multilingual environments.

References

Clausner, C., Antonacopoulos, A., & Pletschacher, S. (2019). Efficient and effective OCR engine training. International Journal on Document Analysis and Recognition, 23(1), 73–88. https://doi.org/10.1007/s10032-019-00347-8

DocuClipper. (2024, December 3). What is the OCR accuracy and how it can be improved. Retrieved from https://www.docuclipper.com/blog/ocr-accuracy/

Fateh, A., Fateh, M., & Abolghasemi, V. (2023). Enhancing optical character recognition: Efficient techniques for document layout analysis and text line detection. Engineering Reports, 6(9). https://doi.org/10.1002/eng2.12832

Fragoso, V., Gauglitz, S., Zamora, S., Kleban, J., & Turk, M. (2011). TranslatAR: A mobile augmented reality translator. IEEE Winter Conference on Applications of Computer Vision (WACV). https://doi.org/10.1109/wacv.2011.5711545

Garain, U., Jain, A., Maity, A., & Chanda, B. (2008). Machine reading of camera-held low quality text images: An ICA-based image enhancement approach for improving OCR accuracy. Proceedings of the International Conference on Pattern Recognition. https://doi.org/10.1109/icpr.2008.4761840

Garris, M., Janet, S., & Klein, W. (n.d.). Impact of image quality in machine print optical character recognition. NIST. Retrieved from https://www.nist.gov/publications/impact-imagequality-machine-print-optical-character-recognition

Ifttt-User. (2023, July 17). Gaussian blurring — A gentle introduction. Towards AI. Retrieved from https://towardsai.net/p/l/gaussian-blurring-a-gentle-introduction

Neumann, L., & Matas, J. (2012). Real-time scene text localization and recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://ieeexplore.ieee.org/document/6248097

Patil, S., Varadarajan, V., Mahadevkar, S., Athawade, R., Maheshwari, L., Kumbhare, S., Garg, Y., Dharrao, D., Kamat, P., & Kotecha, K. (2022). Enhancing optical character recognition on images with mixed text using semantic segmentation. Journal of Sensor and Actuator Networks, 11(4), 63. https://doi.org/10.3390/jsan11040063

Pirker, J., & Wurzinger, G. (2016). Optical character recognition of old fonts: A case study. Graz University of Technology. Retrieved from https://graz.elsevierpure.com/en/publications/optical-character-recognition-of-oldfonts-a-case-study

Rao, R. (2024, November 15). Analysis and benchmarking of OCR accuracy for data extraction models. Retrieved from https://www.docsumo.com/blogs/ocr/accuracy

Reisswig, C., Katti, A., Spinaci, M., & Hohne, J. (2020). Chargrid-OCR: End-to-end trainable optical character recognition for printed documents using instance segmentation. arXiv. https://arxiv.org/abs/1912.13318

Smith, R. (2007). An overview of the Tesseract OCR engine. IEEE Conference on Document Analysis and Recognition. https://ieeexplore.ieee.org/document/4376991

Tavares, R. A. (2024). Comparison of image preprocessing techniques for vehicle license plate recognition using OCR: Performance and accuracy evaluation. arXiv. https://arxiv.org/pdf/2410.13622

Waltz, K., & Gonzales, A. (n.d.). Demystifying Gaussian blur. Retrieved from https://www.adobe.com/ph_en/creativecloud/photography/discover/gaussian-blur.html

Yang, X., Arora, A., Yu Jheng, S., & Dell, M. (2023). Quantifying character similarity with vision transformers. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://aclanthology.org/2023.emnlp-main.863.pdf

Zhu, W., Sokhandan, N., Yang, G., Martin, S., & Sathyanarayana, S. (2022). DocBed: A multi-stage OCR solution for documents with complex layouts. arXiv. https://arxiv.org/abs/2202.0141

Downloads

Published

2025-01-11