An Enhancement of Optical Character Recognition (OCR) Algorithm Applied in Translating Signages to Filipino
Keywords:
Optical Character Recognition (OCR), Image Processing, Text Recognition, Computer Vision, Machine LearningAbstract
Optical Character Recognition (OCR) systems often struggle to extract text accurately from images captured at various distances, particularly under challenging conditions such as blurriness, noise, or poor lighting. These issues are common in real-world scenarios and limit the effectiveness of existing OCR technologies. This study addresses these challenges by applying Gaussian blur after the grayscale conversion. This method reduces noise for the image's clarity without sacrificing the original algorithm's key features. Results revealed that the enhanced OCR algorithm significantly outperformed existing methods in terms of accuracy and confidence levels. It demonstrated the ability to read signages with higher precision, even in difficult conditions such as intricate designs, poor lighting, and long distances. This advancement enables more reliable text recognition and translation, offering practical applications for public signage translation, cross-cultural communication, and improved accessibility in multilingual environments.
References
Clausner, C., Antonacopoulos, A., & Pletschacher, S. (2019). Efficient and effective OCR engine training. International Journal on Document Analysis and Recognition, 23(1), 73–88. https://doi.org/10.1007/s10032-019-00347-8
DocuClipper. (2024, December 3). What is the OCR accuracy and how it can be improved. Retrieved from https://www.docuclipper.com/blog/ocr-accuracy/
Fateh, A., Fateh, M., & Abolghasemi, V. (2023). Enhancing optical character recognition: Efficient techniques for document layout analysis and text line detection. Engineering Reports, 6(9). https://doi.org/10.1002/eng2.12832
Fragoso, V., Gauglitz, S., Zamora, S., Kleban, J., & Turk, M. (2011). TranslatAR: A mobile augmented reality translator. IEEE Winter Conference on Applications of Computer Vision (WACV). https://doi.org/10.1109/wacv.2011.5711545
Garain, U., Jain, A., Maity, A., & Chanda, B. (2008). Machine reading of camera-held low quality text images: An ICA-based image enhancement approach for improving OCR accuracy. Proceedings of the International Conference on Pattern Recognition. https://doi.org/10.1109/icpr.2008.4761840
Garris, M., Janet, S., & Klein, W. (n.d.). Impact of image quality in machine print optical character recognition. NIST. Retrieved from https://www.nist.gov/publications/impact-imagequality-machine-print-optical-character-recognition
Ifttt-User. (2023, July 17). Gaussian blurring — A gentle introduction. Towards AI. Retrieved from https://towardsai.net/p/l/gaussian-blurring-a-gentle-introduction
Neumann, L., & Matas, J. (2012). Real-time scene text localization and recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://ieeexplore.ieee.org/document/6248097
Patil, S., Varadarajan, V., Mahadevkar, S., Athawade, R., Maheshwari, L., Kumbhare, S., Garg, Y., Dharrao, D., Kamat, P., & Kotecha, K. (2022). Enhancing optical character recognition on images with mixed text using semantic segmentation. Journal of Sensor and Actuator Networks, 11(4), 63. https://doi.org/10.3390/jsan11040063
Pirker, J., & Wurzinger, G. (2016). Optical character recognition of old fonts: A case study. Graz University of Technology. Retrieved from https://graz.elsevierpure.com/en/publications/optical-character-recognition-of-oldfonts-a-case-study
Rao, R. (2024, November 15). Analysis and benchmarking of OCR accuracy for data extraction models. Retrieved from https://www.docsumo.com/blogs/ocr/accuracy
Reisswig, C., Katti, A., Spinaci, M., & Hohne, J. (2020). Chargrid-OCR: End-to-end trainable optical character recognition for printed documents using instance segmentation. arXiv. https://arxiv.org/abs/1912.13318
Smith, R. (2007). An overview of the Tesseract OCR engine. IEEE Conference on Document Analysis and Recognition. https://ieeexplore.ieee.org/document/4376991
Tavares, R. A. (2024). Comparison of image preprocessing techniques for vehicle license plate recognition using OCR: Performance and accuracy evaluation. arXiv. https://arxiv.org/pdf/2410.13622
Waltz, K., & Gonzales, A. (n.d.). Demystifying Gaussian blur. Retrieved from https://www.adobe.com/ph_en/creativecloud/photography/discover/gaussian-blur.html
Yang, X., Arora, A., Yu Jheng, S., & Dell, M. (2023). Quantifying character similarity with vision transformers. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://aclanthology.org/2023.emnlp-main.863.pdf
Zhu, W., Sokhandan, N., Yang, G., Martin, S., & Sathyanarayana, S. (2022). DocBed: A multi-stage OCR solution for documents with complex layouts. arXiv. https://arxiv.org/abs/2202.0141