基于Python的识别图片中文字的工具设计与实现-毕业论文--688IT编程网

---文档均为word文档，下载后可直接编辑使用亦可打印---

摘要

随着时代的发展和计算机的普及，现在的资料、文献、档案和书籍都逐渐地变成了数字化的模式，但是在此之前，已有的纸质资料、文献、档案和书籍的存量十分之多，以纸张作为载体来保存这些内容的话存在不少的不方便和安全隐患。纸张是无法再生的，纸张一旦损毁了，上面所记录的内容也将会丢失，而且纸张不方便传播，所以把纸质资料转化为电子化的形式是非常有必要的。光学字符识别（Optical Character Recognition, OCR）是一种能把印刷在或者写在纸上的内容识别成字符并保存到计算中去的技术，在文字录入、书籍电子化这些领域起着至关重要的作用。

在OCR进行识别的时候，存在着一些影响识别成功率的因素，例如图像文件的背景和所识别字符的字体等因素。本文将研究通过图像文件的预处理和训练字库来提升识别的成功率。本次课题所研究的内容主要包括如下内容：

（1）开发一个基于Python的OCR工具。

（2）通过把图片进行灰度化处理、二值化处理和降噪处理减少图像内背景和非字符的干扰，提高识别准确率。

（3）训练字库，使得开发的OCR工具在提高识别的准确率的同时还能够识别除了一般的印刷字体外其他的字体和字符内容。

关键词：OCR技术；信息化；纸质资料；文字录入；灰度化处理；二值化处理

Abstract关于python的书

With the development of the era and the popularity of computers, data, literature, archives and books are now gradually turned into digital forms. But before that, there has been a great number of paper data, literature, archives and books. There are many inconveniences and security risks in using paper as the carrier to preserve these contents. Paper is not regenerated. Once the paper is damaged, the contents recorded on it will be lost, and the paper is not convenient for spreading, so it’s necessary to convert paper data into electronic forms. Optical Character Recognition (OCR) is a techn

ology that can recognize the printed or written content into characters and save them to calculation. It plays an important role in the fields of text input and electronic books.

When OCR is used for recognition, there are some factors affecting the success rate of recognition, such as the background of image file and the font of the recognized characters. This paper focuses on the improvement of the success rate of recognition by preprocessing image file and training font library. The research content of this project mainly includes:

(1) Developing a Python-based OCR tool.

(2) Reducing such interference factors as the background and the font of the recognized characters in the image through grayscale processing, binarization processing and noise reduction processing to improve the accuracy of recognition.

(3) Training the font library to improve the accuracy of recognition and make it possible for the developed OCR tool to recognized special fonts and characters in addition to general printed fonts.

688IT编程网

基于Python的识别图片中文字的工具设计与实现-毕业论文

发表评论

推荐文章

【2023年】山东省菏泽市【统招专升本】计算机真题(含答案)

2022年昆明学院软件工程专业《计算机网络》科目期末试卷B(有答案)

计算机四级网络工程师习题库(附参考答案)

计算机网络基础期中考试A

0301 2022年春考计算机网络技术-网络技术基础练习题+答案

热门文章

2023年湖南省株洲市【统招专升本】计算机真题(含答案)

(2023年)浙江省杭州市【统招专升本】计算机测试卷(含答案)

2022年安徽省池州市【统招专升本】计算机真题(含答案)

【2023年】湖北省襄樊市【统招专升本】计算机测试卷(含答案)

2022年山东省德州市【统招专升本】计算机模拟考试(含答案)

2022年塔里木大学计算机网络技术专业《计算机网络》科目期末试卷A(有...

华科专升本计算机网络-作业全

备考2023年四川省绵阳市【统招专升本】计算机测试卷(含答案)

2023年四川省自贡市统招专升本计算机自考真题(含答案)

2022年广东省揭阳市【统招专升本】计算机真题(含答案)

计算机调试员模拟题(含答案)

(2023年)湖南省湘潭市【统招专升本】计算机真题(含答案)

网络安全协议知识点整理

东北大学2021年9月《计算机网络》作业考核试题及答案参考19

数通知识点

网络安全简答题

计算机网络复习要点

Snort规则

Wireshark抓包分析TCP.IP.UDP.ICMP报文格式(移动互联网方向)

通讯协议有哪几种

最新文章

2022年昆明学院软件工程专业《计算机网络》科目期末试卷B(有答案)

计算机网络基础期中考试A

0301 2022年春考计算机网络技术-网络技术基础练习题+答案

2022年新疆教育学院计算机网络技术专业《计算机网络》科目期末试卷A...

2023年陕西省西安市【统招专升本】计算机真题(含答案)

网络实用技术第三章本章自测答案

标签列表