Application Example_ Photo OCR

合集下载

5分钟用Python搭建一个OCR服务器,可以直接截图中提取文本

5分钟用Python搭建一个OCR服务器,可以直接截图中提取文本

Why?OCR(又叫光学字符识别)已经成为Python的一个常用工具。

随着开源库Tesseract和Ocrad的出现,越来越多的程序员用OCR来编写自己的库文件和bot病毒。

一个OCR的小例子,如用OCR直接从截图中提取文本,省去了重新键入的麻烦。

更多Python视频、源码、资料加群683380553免费获取开始的步骤开始之前,我们需要开发一个后端服务层来表示OCR引擎的结果。

这样你就可以用你喜欢的方式来向端用户表示你的结果。

本文之后将会详细介绍。

除此之外,我们还需要添加一点后端代码来生成HTML表格,以及一些使用这些API的前段代码。

这部分本文没有讲,但是你可以参阅源代码。

准备好开始吧!安装一些依赖项。

这个过程也是非常有趣的。

首先,需要安装一些依赖项本文已经在Ubuntu 14.04测试,但是12.x和13.x版本应该也可以。

如果你用的是OSX系统,可以用VirtualBox,Docker(注意检查有没有包含Dockerfile和install guide),或者是DigitalOcean(推荐使用)来创建相应的环境。

下载依赖项我们需要Tesseract及其所有依赖项,包括Leptonica及其他相关的包。

注意:可以用_run.sh这个shell脚本来快速安装Leptonica和Tesseract的依赖包。

如果这么做的话,可以直接跳到搭建Web服务器(跳转链接)部分。

但是为了学习,如果你以前没有亲手构建过库文件的话,还是建议尝试一下。

发生了什么?简单说,sudo apt-get update意思就是更新包列表。

然后就能安装一些图像处理的包,如libtiff,libpng等等。

除此之外,我们还需要安装Python 2.7,作为我们的编程语言,以及python-imaging库。

说起图像,在将程序里面编辑图像之前我们还需要ImageMagick包。

$ sudo apt-get install imagemagick构建Leptonica和Tesseract再说一遍,如果你用_run.sh脚本已经安装过,就可以直接跳到搭建Web服务器(跳转链接)部分。

人工智能应用导论实验手册-OCR实验手册

人工智能应用导论实验手册-OCR实验手册

OCR实验手册4.3任务二:文字识别(OCR)4.3.1任务目标通过采用深度学习技进行对验证码进行图像识别,通过采用程序进行验证码生成进行数据集准备、采用opencv进行验证码分割、利用深度学习框架Keras进行模型搭建与训练、最后读取模型进行验证码识别,使读者能够:(1)体验一个完整的深度学习算法对验证码进行图像识别应用开发。

(2)掌握图像数据生成、处理、读取等操作。

(3)掌握使用深度学习框架进行模型搭建,训练与预测方法。

4.3.2任务实现本项目的主要流程:灰度→二值化→去干扰线及噪点→切割成单个字符→标注→识别学习并得到模型→使用模型识别。

对获得的原始验证码,处理流程总共分为以下几步:(1)对图片进行灰度处理,如图4-32。

图4-32 灰度处理(2)根据自己设置的阈值,对图片进行二值化处理。

图4-33 二值化处理(3)降噪处理,去除干扰的像素点及像素块。

图4-34 降噪处理(4)对图片进行切割,获得单个字符,并进行人工标注。

图4-35 图片分割(5)使用卷积神经网络进行模型训练,得到模型。

(6)使用训练得到的模型进行验证码识别。

步骤一:批量生成验证码在使用深度学习框架搭建验证码识别模型时,需要大量的验证码图片。

在这里,使用captcha模块生成验证码图片,验证码图片名称为验证码上显示的字符串。

验证码支持10个数字加26+26个大小写英文字母,一共62种字符类型。

通过运行程序在当前目录文件夹pic生成验证码。

运行代码:1gen_captcha.py,将使用代码生成随机验证码图像,运行程序后,在pic 文件夹中输出100张图像,如图4-36所示图4-36 验证码数据集步骤二:使用opencv看到验证码生成验证可以通过电脑的默认图片查看器查看,也可以用程序进行显示,这里采用opencv模板对图像进行读取与显示,同时还有打印图片的信息,例如图像宽高等。

代码路径:2show_img.pyimport cv2file_name = "./test_img/test_img_1.png"#读取图像img = cv2.imread(file_name)#图片大小信息print(img.shape)#显示验证码图片和验证码标题cv2.imshow("win", img)#窗口结束时间,如果为0,一直显示cv2.waitKey(0)运行效果如图4-37所示:图4-37 验证码显示步骤三:图像二值化处理由于验证码图像是四个字符,因此需要对这4个字符进行图像处理,完成分割,得到分别4个独立的字符,字符识别是对每个字符分别进行。

拍照英语翻译器

拍照英语翻译器

拍照英语翻译器Taking photos, whether with a professional camera or a smartphone, is a popular activity that allows us to capture special moments, beautiful landscapes, and interesting subjects. With the advancement of technology, there are now numerous photo translation apps available that can translate text in images from one language to another. These apps have made it easier for travelers, language learners, and professionals to understand and communicate in a foreign language.The photo translation apps work by using optical character recognition (OCR) technology to recognize the text in the image and then using machine translation to translate it into the desired language. This technology has greatly improved over the years and can now accurately translate a wide range of languages, including Chinese, Spanish, French, and many others.Using a photo translation app is simple and convenient. All you need to do is open the app, select the option to translate text in an image, and then take a photo of the text you want to translate. The app will then analyze the image and provide you with the translated text in real-time. Some apps also have the option to manually select the text you want to translate by highlighting it on the image.Photo translation apps have many practical uses. For example, when traveling to a foreign country, you may come across signs, menus, or documents in a language you don't understand. By using a photo translation app, you can quickly and easily translate the text and understand what it means. This can be especially helpfulwhen trying to navigate public transportation, order food at a restaurant, or read important information.Moreover, photo translation apps are also useful for language learners who want to practice reading and understanding a foreign language. Instead of looking up each word in a dictionary, they can simply take a photo of a text and instantly see the translation. This can save a lot of time and effort and make the language learning process more efficient.Another application for photo translation apps is in professional settings. For example, if you receive an email or document in a foreign language, you can use a photo translation app to translate it and understand its content. This can be particularly helpful for businesses or individuals working with international clients or partners.In conclusion, photo translation apps have revolutionized the way we communicate and understand foreign languages. They have made it easier and more efficient to translate text in images, whether it's for travel, language learning, or professional purposes. With the constant advancement of technology, we can expect these apps to become even more accurate and reliable in the future. So next time you come across a text in a foreign language that you don't understand, don't worry – just take out your phone and use a photo translation app to instantly get the translation you need.。

斑马技术公司DS8108数字扫描仪产品参考指南说明书

斑马技术公司DS8108数字扫描仪产品参考指南说明书
Chapter 1: Getting Started Introduction .................................................................................................................................... 1-1 Interfaces ....................................................................................................................................... 1-2 Unpacking ...................................................................................................................................... 1-2 Setting Up the Digital Scanner ....................................................................................................... 1-3 Installing the Interface Cable .................................................................................................... 1-3 Removing the Interface Cable .................................................................................................. 1-4 Connecting Power (if required) ................................................................................................ 1-4 Configuring the Digital Scanner ............................................................................................... 1-4

CTPN在快递单文字检测中的应用研究

CTPN在快递单文字检测中的应用研究

CTPN在快递单文字检测中的应用研究李欢欢 徐小云 王红蕾(四川中电启明星信息技术有限公司 四川成都 610041)摘要:文本检测与文字识别技术是计算机视觉应用技术的重点研究方向,其中前者是后者的研究基础。

文章探索了CTPN(Connectionist Text Proposal Network)技术在物流快递单文字识别任务中对文字检测的有效性和准确性。

CTPN方法引入了垂直锚点机制,结合CNN(Convolutional Neural Network)网络模型提取的空间特征与BLSTM(Bi-directional Long Short-Term Memory)神经网络模型提取的序列特征来提高文字检测精度。

文章首先探讨了CTPN技术的相关原理,再构造快递单文字识别实验验证其对文字的检测能力。

实验证明,CTPN技术能有效且准确地检测出快递单中的文字,为文字识别奠定基础。

关键词:文字检测 卷积神经网络 文本检测网络 区域候选网络中图分类号:TM715文献标识码:A 文章编号:1672-3791(2023)15-0058-04Research on the Application of CTPN in the Text Detection ofExpress WaybillsLI Huanhuan XU Xiaoyun WANG Honglei(Aostar Information Technologies Co., Ltd., Chengdu, Sichuan Province, 610041 China)Abstract:Text detection and text recognition technology are the key research directions of computer vision appli‐cation technology, and the former is the research basis of the latter. This paper explores the effectiveness and accu‐racy of CTPN technology for text detection in the text recognition task of logistics express waybills. The CTPN method introduces a vertical anchor mechanism, and improves the accuracy of text detection in combination with the spatial features extracted by the CNN network model and the sequence features extracted by the BLSTM neural network model. This paper first discusses the relevant principles of CTPN technology, and then constructs the text recognition experiment of express waybills to verify its ability to detect texts. The experiment shows that CTPN technology can effectively and accurately detect the text in the express bill and lay foundation for text recognition. Key Words: Text detection; Convolutional neural network; Text detection network; Region proposal network随着物流业的快速发展,各快递点的分拣、配发工作愈加繁重,且不同快递公司快递单设计样式不统一。

佳能SX150IS软件使用说明

佳能SX150IS软件使用说明
··除USB鼠标或键盘外,如果同时使用其他USB设备,连接可能无法正确操作。在这种情重新接上相机。
··请勿将两个或更多相机同时连接到同一台计算机上,此种连接将无法正确操作。 ··如果相机是经由USB界面连接线连接计算机时,切勿让计算机进入睡眠(待机)状态。如果
将整张图像打印于单页纸张上..........................................................................................................12 索引打印..........................................................................................................................................14
如何使用CameraWindow............................................................................................... 16
打开的第一个菜单画面.....................................................................................................................16 图像传输画面...................................................................................................................................17 组织图像的画面................................................................................................................................18

Adobe Acrobat SDK 开发者指南说明书

Adobe Acrobat SDK 开发者指南说明书
Please remember that existing artwork or images that you may want to include in your project may be protected under copyright law. The unauthorized incorporation of such material into your new work could be a violation of the rights of the copyright owner. Please be sure to obtain any permission required from the copyright owner.
This guide is governed by the Adobe Acrobat SDK License Agreement and may be used or copied only in accordance with the terms of this agreement. Except as permitted by any such agreement, no part of this guide may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, recording, or otherwise, without the prior written permission of Adobe. Please note that the content in this guide is protected under copyright law.

umiocr命令行用法

umiocr命令行用法

UmiOCR命令行用法如下:
UmiOCR命令行参数包括:
* `-h` 或`--help`:显示帮助信息。

* `-v` 或`--version`:显示版本信息。

* `-l` 后面接语言代码,例如`-l eng` 表示使用英语识别(默认为英语)。

* `-c` 后面接配置文件路径。

* `-d` :打印调试信息。

* `-s` :将文档图像转换为单列文字。

* `-p` :将识别结果输出为PDF文件。

举例来说,如果要将图片文件`image.png` 识别并把结果输出到控制台,可以使用以下命令:`umiocr image.png`。

如果想指定语言为中文,将识别结果输出到控制台,可以使用以下命令:`umiocr -l chi image.png`。

如果想将识别结果输出为PDF文件,可以使用以下命令:`umiocr -p image.png`。

如果想指定配置文件路径,并将调试信息打印出来,可以使用以下命令:`umiocr -c config.ini -d image.png`。

如果想将文档图像转换为单列文字,并输出为PDF文件,可以使用以下命令:`umiocr -s -p image.png`。

佳能打印机MP145使用手册

佳能打印机MP145使用手册

Canon pixmaMP145打印机使用手册本电子手册的屏幕包括两个部分:菜单框架(左侧)和内容框架(右侧)。

注释本节以英文屏幕拍图为例进行说明。

(1)菜单框架(2)内容框架注释1.本手册中说明的操作只适用于运行安装有service pack 2的windows xp操作系统(以下称为“windows xp sp2”)或运行mac os x 10.4.x操作系统的计算机。

根据所用的操作系统,操作可能有所不同。

2.以上示例中使用windows环境下的屏幕。

macintosh环境下的屏幕外观稍有不同。

1 从菜单浏览单击菜单框架中的某个标题。

内容框架中会出现相应的主题。

2 在内容框架中浏览(1)带有蓝色下划线的文本是本手册中相关章节的链接。

单击该链接跳转至相关页面。

3 打印本手册windows要打印某个主题,单击内容框架以确保其处于激活状态,然后单击打印按钮。

注释不激活内容框架而单击打印按钮可以打印菜单框架。

macintosh要打印某个主题,单击内容框架以确保其处于激活状态,然后在文件(file)菜单上选择打印...(print...),并单击打印(print)按钮。

注释不激活内容框架而单击打印(print)按钮可以打印菜单框架。

4 通过关键字搜索主题可以通过输入关键字搜索主题。

windows单击显示按钮。

在搜索屏幕中,输入关键字并单击列出主题按钮。

从搜索结果列表中选择要阅读的主题,然后单击显示按钮。

将显示该页面。

注释此功能搜索计算机上已安装的所有电子手册。

macintosh注释1搜索xxx:搜索当前电子手册。

* “xxx”表示本机的名称和电子手册的标题。

2搜索所有帮助:搜索计算机上安装的所有在线帮助。

5 本手册中使用的符号如果忽略这些说明,可能会由于不正确的设备操作,而导致死亡或严重的人身伤害。

为了能够安全地操作,必须留意这些警告。

如果忽略这些说明,可能会由于不正确的设备操作,而导致人身伤害或器材损坏。

ocr用户使用手册

ocr用户使用手册

ocr用户使用手册OCR(Optical Character Recognition)用户使用手册欢迎使用OCR技术,该技术可以将纸质文档上的文字转换为可编辑和可搜索的电子文本。

以下是OCR用户使用手册的步骤和说明:步骤1:安装OCR软件首先,您需要安装一款OCR软件。

常见的OCR软件包括Adobe Acrobat、ABBYY FineReader、Tesseract等。

根据您的需求和预算选择最适合的软件,并按照其安装向导进行安装。

步骤2:准备要识别的文档将待识别的纸质文档放在扫描仪或拍照设备上,确保图像清晰且文字易于辨认。

如果您已有电子文档,可以跳过此步骤。

步骤3:使用OCR软件进行识别打开OCR软件,并加载要进行文字识别的图像或文档。

根据软件界面的指引,选择OCR功能或选项,并开始识别过程。

识别的具体步骤可能因软件而异,通常包括预处理图像、选择识别语言、识别文字等操作。

步骤4:编辑识别的文本(如有必要)一旦OCR完成识别过程,您会得到一个可编辑的文本文件或电子文档。

检查并编辑识别的文本,纠正可能的错误或误识别。

OCR软件通常也提供文本编辑工具,使您能够直接在软件中进行修改。

步骤5:保存和导出结果完成编辑后,保存您的工作并选择适当的文件格式以导出结果。

常见的文件格式包括PDF、DOC、TXT等。

您还可以选择将导出结果保存到云存储或其他位置以进行备份和共享。

注意事项:- 确保图像清晰:使用高质量的扫描仪或拍照设备捕捉图像,并确保图像清晰可辨认。

- 选择正确的语言:在进行OCR识别之前,确保选择了正确的语言设置。

某些OCR软件还支持多种语言的同时识别。

- 编辑识别的文本:请注意检查和编辑识别的文本,因为OCR 软件可能存在误识别的情况。

尤其是对于手写文本、模糊图像或低质量的扫描件,可能需要更多的编辑工作。

希望本OCR用户使用手册能够帮助您顺利进行文字识别和转换工作。

如有其他问题,请及时咨询OCR软件的用户手册或其官方网站的支持页面。

计算机常用英语单词大全

计算机常用英语单词大全

计算机常用英语单词大全NNet PC网络计算机Network adapter card网卡Network personal computer网络个人电脑 Network terminal 网络终端Notebook computer笔记本电脑 Notebook system unit笔记本系统单元Numeric entry数字输入naïve天真的人national information infrastructure protection act of1996国际信息保护法案national service provider全国性服务供应商 Network architecture网络体系结构 Network bridge网桥Network gateway网关network manager网络管理员newsgroup新闻组no electronic theft act of1997无电子盗窃法 Node节点Nonvolatile storage非易失性存储OObject embedding对象嵌入Object linking目标链接Open architecture开放式体系结构 Optical disk光盘Optical mouse光电鼠标Optical scanner光电扫描仪Outline大纲off-line browsers离线浏览器Online storage联机存储Ppalmtop computer掌上电脑Parallel ports并行端口Passive-matrix被动矩阵PC card个人计算机卡Personal laser printer个人激光打印机 Personal video recorder card个人视频记录卡 Photo printer照片打印机Pixel像素Platform scanner平版式扫描仪 Plotter绘图仪Plug and play即插即用Plug-in boards插件卡Pointer指示器Pointing stick指示棍Port端口Portable scanner便携式扫描仪 Presentation files演示文稿Presentation graphics电子文稿程序 Primary storage主存Procedures规程Processor处理机Programming control lanugage程序控制语言Packets数据包Parallel data transmission平行数据传输 Peer-to-peer network system 得等网络系统 person-person auction site个人对个人拍卖站点physical security物理安全Pits凹面plug-in插件程序Polling轮询privacy隐私权proactive主动地programmer程序员Protocols协议provider供应商proxy server代理服务pull products推取程序push products推送程序RRAM cache随机高速缓冲器Range范围Record记录Relational database关系数据库 Replace替换Resolution分辨率Row行Read-only只读Reformatting重组regional service provider区域性服务供应商 repetitive motion injury 反复性动作损伤 reverse directory反向目录right to financial privacy act of 1979财产隐私法案Ring network环形网络SScanner扫描器Search查找Secondary storage device助存储设备 Semiconductor半导体Serial ports串行端口Server服务器Shared laser printer共享激光打印机 Sheet表格Silicon chip硅片Slots插槽Smart card智能卡Soft copy软拷贝Software suite软件协议Sorting排序分类Source file源文件Special-purpose application专用文件 Spreadsheet电子数据表Standard toolbar标准工具栏 Supercomputer巨型机System cabine 系统箱System clock时钟System software系统软件 Satellite/air connection services卫星无线连接服务search engines搜索引擎search providers搜索供应者 search services 搜索服务器 Sectors扇区security安全Sending and receiving devices发送接收设备Sequential access顺序存取 Serial data transmission单向通信signature line签名档snoopware监控软件software copyright act of1980软件版权法案software piracy软件盗版Solid-state storage固态存储器 specialized search engine专用搜索引擎spiders网页爬虫spike尖峰电压Star network星型网Strategy方案subject主题subscription address预定地址 Superdisk超级磁盘surfing网上冲浪surge protector浪涌保护器 systems analyst系统分析师TTable二维表Telephony电话学Television boards电视扩展卡 Terminal 终端Template模板Text entry文本输入Thermal printer 热印刷 Thin client瘦客Toggle key触发键Toolbar工具栏Touch screen触摸屏Trackball追踪球TV tuner card电视调谐卡 Two-state system双状态系统 technical writer 技术协作者 technostress重压技术telnet远程登录Time-sharing system分时系统 Topology拓扑结构Tracks磁道traditional cookies传统的信息记录程序Twisted pair双绞线UUnicode统一字符标准uploading上传usenet世界性新闻组网络VVirtual memory虚拟内存 Video display screen视频显示屏 Voice recognition system声音识别系统vertical portal纵向门户video privacy protection act of 1988视频隐私权保护法案virus checker病毒检测程序 virus病毒Voiceband音频带宽Volatile storage易失性存储 voltage surge冲击性电压WWand reader 条形码读入Web 网络Web appliance 环球网设备Web page网页Web site address网络地址Web terminal环球网终端Webcam摄像头What-if analysis假定分析Wireless revolution无线革命Word字长Word processing文字处理Word wrap自动换行Worksheet file 工作表文件web auctions网上拍卖web broadcasters网络广播web portals门户网站web sites网站web storefront creation packages网上商店创建包 web storefronts网上商店web utilities网上应用程序web-downloading utilities网页下载应用程序 webmaster web站点管理员web万维网Wireless modems无线调制解调器wireless service provider无线服务供应商 world wide web万维网worm蠕虫病毒Write-protect notch写保护口其他缩写DVD digital bersatile 数字化通用光盘 IT ingormation technology信息技术 CD compact disc 压缩盘PDA personal digital assistant个人数字助理 RAM random access memory随机存储器 WWW World Wide Web 万维网DBMS database management system数据库管理系统 HTML Hypertext Markup Language超文本标示语言 OLE object linking and embedding对象链接潜入 SQL structured query language结构化查询语言 URL uniform resouice locator统一资源定位器 AGP accelerated graphics port加速图形接口 ALU arithmetic-logic unit算术逻辑单元 CPU central processing unit中央处理器 CMOS complementary metal-oxide semiconductor互补金属氧化物半导体CISC complex instruction set computer复杂指令集计算机HPSB high performance serial bus高性能串行总线 ISA industry standard architecture工业标准结构体系 PCI peripheral component interconnect外部设备互连总线 PCMCIA Personal Memory Card International Association个人计算机存储卡国际协会RAM random-access memory随机存储器ROM read-only memory只读存储器USB universal serial bus通用串行总线CRT cathode-ray tube阴极射线管HDTV high-definition television高清晰度电视 LCD liquid crystal display monitor液晶显示器 MICRmagnetic-ink character recognition磁墨水字符识别器 OCR optical-character recognition光电字符识别器 OMR optical-mark recognition光标阅读器TFT thin film transistor monitor薄膜晶体管显示器其他Zip disk压缩磁盘Domain name system(DNS)域名服务器file transfer protocol(FTP)文件传送协议hypertext markup language(HTML)超文本链接标识语言 Local area network(LAN)局域网internet relay chat(IRC)互联网多线交谈Metropolitan area network(MAN)城域网Network operation system(NOS)网络操作系统 uniform resourcelocator(URL)统一资源定位器 Wide area network(WAN)广域网。

英语在线拍照翻译

英语在线拍照翻译

英语在线拍照翻译Online Photo Translation in EnglishOnline photo translation is the process of using digital tools and software to translate text captured in images from one language to another. It has become increasingly popular in recent years, as it provides a quick and convenient way to overcome language barriers.There are many online platforms and applications that offer photo translation services. These platforms typically use optical character recognition (OCR) technology to extract text from the image and then translate it into the desired language. This allows users to simply take a photo of a document or sign, for example, and have it translated into their native language instantly.Photo translation can be used in a variety of situations. For travelers, it can be particularly useful when navigating unfamiliar places or reading menus in foreign restaurants. Rather than struggling to understand signs or decipher written instructions, travelers can simply take a photo and have the translation appear on their device.Photo translation can also be beneficial for students and researchers who need to translate texts or documents in foreign languages. It makes the process much easier and saves time compared to manually typing or copying the text into a translation tool.Another advantage of online photo translation is that it allows forreal-time communication with people who don't speak the same language. For example, if someone is in a foreign country and needs to communicate with locals who don't speak English, they can use photo translation to translate their messages instantly. This can be particularly helpful in emergency situations or when immediate assistance is needed.However, it is important to note that online photo translation may not always be 100% accurate. Due to variations in handwriting, font styles, and image quality, there may be errors or inaccuracies in the translation. It is always a good idea to double-check the translation if accuracy is crucial.In conclusion, online photo translation provides a convenient and efficient way to overcome language barriers. With the help of OCR technology and translation software, users can easily translate text captured in images in real time. While it may not be 100% accurate, it is a valuable tool for travelers, students, and anyone who needsto communicate in a foreign language.。

Genius 扫描仪 使用者指南

Genius 扫描仪 使用者指南

使用者指南Genius 扫描仪本出版品所提及之所有商标及品牌之所有权分别属于本公司所有。

© 2003-2004…KYE System Corp. 版权所有。

未经许可不得复制本文件之任何部份。

目录简介 (1)如何使用本指南 (1)本指南编排惯例 (2)图标说明 (2)系统需求 (2)产品及附件 (3)扫描仪功能简介 (4)第I章、安装G ENIUS扫描仪 (5)软件需求 (5)硬件需求 (5)安装和设定扫描仪 (6)步骤一、连接您的扫描仪和计算机 (6)步骤二、安装软件 (6)第三步、测试扫描仪 (8)软件设定疑难排解 (8)其它图像处理软件 (9)移除扫描仪 (9)升级到Windows XP后扫描仪的安装 (9)第II章、使用与保养 (10)操作扫描仪 (10)扫描和编辑图像 (10)状态指示灯 (12)扫描仪连接状态 (12)扫描仪的维护 (13)校正扫描仪 (13)清洁扫描仪 (13)使用与保养注意事项 (14)扫描仪的收藏 (15)直立座 (15)挂壁架 (16)第III章、如何使用快速功能按键 (17)Scan (扫描) 按键 (17)Custom (自定) 按键 (19)改变扫描设定 (20)附录A:扫描仪规格 (21)附录B:客户服务与保证 (22)产品保证声明 (22)FCC无线电频率声明 (23)与G ENIUS联系 (24)简介欢迎来到Genius扫描世界,它将成为您图像处理的最佳工具。

Genius扫描仪不仅能进行图像扫描处理,而且能把印刷文件扫描后转为电子文件输入计算机供您作进一步的编辑,从而大大提升每日计算机工作成效和专业性。

如同我们其它产品,您的新扫描仪已经过严格的测试,并以我们一贯的客户满意度为保证。

本用户指南的最后一页提供与Genius服务中心的联系信息。

欢迎与我们联系或造访GeniusLife网页 () 以获得我们更好的服务。

感谢您选择Genius的产品。

HALCON入门(4)HDevelop常用工具之OCR

HALCON入门(4)HDevelop常用工具之OCR

HALCON⼊门(4)HDevelop常⽤⼯具之OCRHALCON⼊门(4)HDevelop常⽤⼯具之OCR整理编辑:YWB本⽂介绍如何借助HDevelop的⼯具以及插件快速完成字符的识别(OCR)与训练。

1字符识别1.1打开OCR助⼿通过菜单选择“助⼿”->“打开新的OCR”,我们可以打开⼀个OCR助⼿。

1.2加载图像在使⽤插件之前我们需要先加载⼀副图像。

点击浏览按钮并选择待检测⽂件,我们使⽤HALCON例程⾃带图像(%HALCONEXAMPLES%\images\ocr\bottle_label_01.jpg),加载后的⽂件会出现在HDevelop的图像窗⼝中。

我们也可以通过其他⽅式进⾏图像的导⼊,如从图像窗⼝或从相机(图像采集助⼿)加载数据。

1.3选择字符区域HALCON⽀持各种形状的感兴趣区域,圆形、椭圆、矩形等等。

点击我们要绘制的形状,将⿏标移⾄图像窗⼝,通过拖拽的⽅式圈定⽂字的范围并点击⿏标右键确定。

1.4快速设置当感兴趣区域选择好后,内部不同颜⾊的区域为分割好的字体,区域下⽅为识别到的字符。

我们可以看到,⽬前的识别效果并不能令我们满意。

这是由于字体的分割参数和图像不匹配所致。

在算法的测试过程中,调参是⼀个令⼈⾮常头疼的⼯作。

⽽在OCR助⼿中,我们可以通过“快速设置”的⽅式对字符分割参数进⾏⾃适应设置。

⾸先填写我们所期望读出的数据,并在复选框中勾选与字体特征对应的选项,如字体为浅⾊时需要勾选“字符是暗背景上的亮字符”选项,⼜如对于激光刻印或⼀些冲压字符需勾选“字符是由很多孤⽴的点组成…”选项。

最后点击“应⽤快速设置”完成参数调整。

通过快速设置可以轻松实现对⽐度较好图像的字符分割与识别,由于现场的光照和⼯作环境可能导致图像出现亮度或对⽐度⽅⾯的变化,因此在测试时我们需要使⽤插件加载更多不同⼯作环境的图像,以确保当前参数可以适应各种⼯作状态。

1.5微调参数在“设置”选项卡中,加载不同的图像并调整感兴趣区域,对当前分割和提取参数在不同图像上进⾏测试。

基于Django印刷体维吾尔文识别系统的设计与实现

基于Django印刷体维吾尔文识别系统的设计与实现

㊀第53卷第3期郑州大学学报(理学版)Vol.53No.3㊀2021年9月J.Zhengzhou Univ.(Nat.Sci.Ed.)Sep.2021收稿日期:2020-08-25基金项目:国家自然科学基金项目(61433012);国家 973 重点基础研究计划基金项目(2014CB340506)㊂作者简介:熊黎剑(1996 ),男,硕士研究生,主要从事OCR 研究,E-mail:xiong_lijian@;通信作者:吾守尔㊃斯拉木(1941 ),男,教授,中国工程院院士,主要从事多语种信息处理研究,E-mail:wushour@㊂基于Django 印刷体维吾尔文识别系统的设计与实现熊黎剑1,2,3,㊀吾守尔㊃斯拉木1,2,3,㊀许苗苗1,2,3(1.新疆大学信息科学与工程学院㊀新疆乌鲁木齐830046;2.新疆多语种信息技术实验室㊀新疆乌鲁木齐830046;3.新疆多语种信息技术研究中心㊀新疆乌鲁木齐830046)摘要:光学字符识别(optical character recognition,OCR)技术在图书数字化㊁文献管理等诸多领域得到了广泛应用,而相比于已十分成熟的中文㊁英文印刷体识别系统,小文种(维吾尔文)印刷体识别还有研究空间和实际应用需求㊂针对传统识别方法特征表示不足等问题,结合日益兴起的深度学习技术,采用Python 语言编程,选用经改进的卷积循环神经网络作为识别算法核心,并利用Django 设计系统框架㊂实验表明,印刷体维文识别系统的精度达到95.7%,平均速度达到12.5fps㊂该系统实现了端到端的维文整词识别㊂关键词:卷积循环神经网络;门控循环单元;连接时序分类器;印刷体维吾尔文中图分类号:TP391㊀㊀㊀㊀㊀文献标志码:A㊀㊀㊀㊀㊀文章编号:1671-6841(2021)03-0009-06DOI :10.13705/j.issn.1671-6841.20202770㊀引言随着信息化社会的不断推进,光学字符识别(optical character recognition,OCR)技术在各领域开花结果㊂印刷体文字识别在数字化办公㊁文献管理等方面均有良好的应用前景㊂相比于已成熟化的印刷体中㊁英文识别,印刷体维吾尔文识别还有研究空间[1]㊂维吾尔文多使用于我国新疆地区,包含32个字母,其中8个元音字母,24个辅音字母,词序是主语-谓语-宾语[2]㊂有一些维文字母的主体部分相同,仅依靠上下点的标记来区别不同字符[3]㊂同时,当字母出现在词前㊁词中㊁词末以及独立出现时,对应的写法也不同,切分不当会导致单词的改变,所以本文从整词识别入手㊂已有研究方法多以传统方法为主,如基于统计和结构的方法[4]㊁模板匹配法[5]等㊂这些方法往往需要较多的人工干预,包括手工设计特征和人工建立标准的匹配库等,因此效率不高㊂近些年以来,国内相关的维文OCR 系统是西安电子科技大学卢朝阳教授团队开发的维吾尔文识别软件㊂它的设计核心是:单词切分成字符再结合人工选取的特征(如方向线素特征和梯度特征),再用欧氏距离分类器[6-8],最终得到候选结果㊂2019年,该实验室又利用 翻字典 原理设计了从字符到单词的两级级联分类器[9],完成了维文单词的识别㊂以上方法均是手工选取特征结合分类器进行识别,在特征选择方面具有一定的局限性㊂本文选用经改进的卷积循环神经网络(convolutional recurrent neural network,CRNN)和连接时序分类器(connectionist temporal classification,CTC)作为文字识别的核心算法,Django 作为构建前后端的框架,搭建了完整的维文整词识别系统㊂1㊀算法原理1.1㊀卷积循环网络用于文字识别领域的卷积循环神经网络(CRNN)是由Shi 等提出的[10]㊂它由深层卷积网络(deep conv-郑州大学学报(理学版)第53卷olutional neural network,DCNN)加循环网络(recurrent neural network,RNN)构成㊂随着计算机视觉领域研究的持续火热,DCNN 经常被用于图像特征提取,同时,它也在目标检测[11-12]㊁情感分析[13]㊁图像处理[14]等方面表现优异,但是文字的检测与识别不同于一般的目标检测任务,基于特征(人工设计或CNN 得到)及分类的方法往往忽视了文本隐含上下文关联的特点㊂RNN 能处理序列信息,在原有的CRNN 中,RNN 部分是双向长短期记忆网络(bi-directioanl long short-term memory,BiLSTM),但其结构复杂,训练收敛速度慢㊂本文将BiLSTM 替换为更为简洁的双向门控循环神经单元网络(bi-directioanl gated recurrent unit,BiGRU)[15]㊂实验证明,改进后的CRNN 网络比原有网络收敛的速度更快,同时,在测试精度方面也有略微提升㊂1.2㊀门控循环神经单元网络(GRU )GRU 是在RNN 和LSTM 的基础上一步步演变而来的,LSTM 网络解决了RNN 在训练时容易出现梯度爆炸和梯度消失的问题,而相比LSTM 更为复杂的3门结构 输入门㊁忘记门和输出门,GRU 将其简化为2门结构 更新门和重置门,这样简洁的结构减少了网络训练收敛时间,具有更高的计算效率,提高了模型精度㊂GRU 内部结构如图1所示㊂图1㊀GRU 结构图Figure 1㊀Diagram of GRU structureGRU 的前向传播计算公式[15]为z t =σ(W z ㊃[h t -1,x t ]),r t=σ(W r ㊃[h t -1,x t ]),h t =tanh(W h ㊃[r t ∗h t -1,x t ]),h t =(1-z t )∗h t -1+z t ∗h t ,ìîíïïïïïï式中:z t 代表更新门;σ代表Sigmoid 激活函数;W z 代表更新门权重矩阵;h t -1代表t -1时刻隐藏层的输出;x t 代表t 时刻输入;[]表示两个向量相连;r t 代表重置门;W r 代表重置门权重矩阵;h t 代表t 时刻的候选隐藏层;tanh 代表双曲正切激活函数;W h 代表候选隐藏层权重矩阵;∗代表矩阵的哈达玛积;h t 代表t 时刻隐藏层的输出㊂图2㊀系统框架图Figure 2㊀Diagram of system frame重置门用来控制需要保留多少之前的信息,被忘记的历史信息越多,其值越小;更新门主要决定被添加到当前状态信息中的历史信息量,经过Sigmoid 函数激活,取值为0~1;这两个门共同决定了隐藏状态的输出㊂本文采用正向GRU 和反向GRU 结合成双向GRU(BiGRU),并用双层堆叠形式进行序列建模,其中隐藏层单元数为256㊂如图2所示㊂1.3㊀连接时序分类器连接时序分类器是一种用于解决不等长序列的输出问题(序列对齐问题)的算法,最早由Graves 提出,之后他又将CTC 成功应用于语音识别方面[16]㊂训练时无须切分语料,也不需要中间语音的表示,在测试集上错误率低至17.7%㊂该解码算法能有效解01㊀第3期熊黎剑,等:基于Django 印刷体维吾尔文识别系统的设计与实现决输入㊁输出序列不等长的问题㊂数学模型上,CTC 层也叫转录层,是根据上一层(RNN 层)输出长度为T 的预测序列{x 1,x 2, ,x T },去寻找具有最高概率的标签序列㊂p (l x )=ðπɪF -1(l )p (πx ),(1)p (πx )=ᵑT t =1y t πt ,(2)式(1)表示输出标签序列的概率是多条路径概率之和㊂其中:l 是标签序列;x 是输入序列;π是输出路径;F -1是标签到路径的映射㊂式(2)表示基于输入x 输出π路径的概率㊂其中:T 是输入序列长度;y t πt 是t 时刻输出π路径的概率㊂维吾尔文识别的一个CTC 实例㊂1)RNN 层输出标签序列,时序中没有标签的地方用 -代替;2)CTC 去除空白标签 - ,删除重复标签,只保留;3)整个过程可表示为㊀㊀(汉语意思为 建立 )㊂由上述实例可以看出,CTC 对齐前的输入序列长度为26,CTC 对齐后的输出序列长度为5,可见CTC 有效地解决了序列对齐问题㊂2㊀识别系统的设计与实现2.1㊀系统框架本文采用开源的Django 设计系统,遵循M (model)T(templete)V(view)模式㊂用户在浏览器端发送请求,通过urls.py 发给view 处理,view 再调用对应的templete 和model 进行处理㊂其中view 负责业务逻辑,templete(主要是HTML 文件)负责页面展示,model 负责数据库对象和业务对象㊂这种松耦合和相互独立的特性,易于开发和维护㊂系统流程如图3所示㊂2.2㊀系统功能与展示后台输入命令启动服务,然后在浏览器页面输入网址,开启Web 服务㊂1)上传功能㊂选择一张本地图像,点击提交,图像会自动上传到后台splite 数据库㊂2)识别功能㊂后台读取数据库中的图像,调用识别模块和计时模块,将识别结果(汉语意思为 创造力 )㊁时间(0.07s)㊁原图片及图片名返回前端页面并显示,如图4所示㊂图3㊀系统框架图Figure 3㊀Diagram of system frame 图4㊀系统展示图Figure 4㊀Photo of system3㊀实验与结果3.1㊀实验数据1)训练数据(合成数据)利用JAVA 语言编写脚本,合成了含32个维文字母(8个元音,24个辅音)在内的约10万张图片数据11郑州大学学报(理学版)第53卷(JPG 格式),以及对应的标签数据(TXT 格式)㊂同时,为了使训练样本更具代表性,本文对32个维文字母作了数据均衡处理㊂部分训练图片如图5所示㊂图5㊀部分训练数据Figure 5㊀The sample of training data2)测试数据(真实数据)从天山网(维文版)(http:ʊ /)中的不同栏目进行收集并制作成测试图片和标签㊂总数约1500张,部分测试数据图片如图6所示㊂图6㊀部分测试数据Figure 6㊀The sample of testing data3.2㊀实验设置为了验证系统的有效性,本文设置了对比实验㊂采用约10万张图片作为训练数据,分别在CRNN 和改进的CRNN(BiGRU)上训练,并将训练得到的模型文件分别在测试集上进行测试㊂实验中的PC 机主要配置为:Nvidia 独立显卡(1060Ti 6G 内存)等㊂所依赖的软件及环境为:Pycharm(社区版)编译工具㊁Ubuntu18.04操作系统㊁Python3.6.2编程语言㊁Pytorch1.2.0等㊂1)实验中精度的定义为A =(n t /n s )㊃100%,其中:n t 代表正确识别样本数;n s 代表样本总数;A 代表识别精度㊂2)实验中识别速度的定义为v =1/(t o -t i ),其中:t o 代表获得字符串时刻;t i 代表输入图片时刻;v 代表识别速度㊂3)实验中网络训练的损失函数定义为O =-ðI i ,l i ɪχlog p (l i y i ),其中:χ代表训练集;I i 代表输入图片;l i 代表标签序列;y i 代表循环层产生的概率预测序列㊂从损失函数可知,它直接从输入的维文印刷体图片I i 和对应的单词标签序列l i 中计算损失值,网络实现了字母免分割的训练㊂3.3㊀实验结果本实验对两种方法均进行了充分训练,当损失趋于收敛后,保留最终模型文件,其中CRNN(BiGRU)收敛速度更快㊂在测试集上,CRNN 的精度为94.1%,CRNN(BiGRU)的精度为95.7%,平均速度为12.5fps,表现出较好性能㊂究其原因,循环层由BiGRU 替换,简化了模型结构,加快了模型训练收敛速度,提高了计算效率㊂此外,训练数据均衡也使得识别率较为稳定㊂2131㊀第3期熊黎剑,等:基于Django印刷体维吾尔文识别系统的设计与实现4 结语针对传统维文识别方法特征表示不足和基于切分的识别方法易出错等问题,本文从整词识别入手,采用卷积神经网络自动提取文字的深层次抽象特征,并对循环层进行改进,用BiGRU替换原有的BiLSTM,改善了识别性能㊂引入连接时序分类器,很好地解决了维文字符难切分以及不等长输入输出问题㊂测试识别精度达到95.7%,平均速度达到12.5fps㊂最后,利用Django框架,设计了一个端到端的维文整词识别系统㊂因此,该系统具有一定的实际应用价值㊂然而,现有系统只能识别纯维文(不含数字㊁字符),从实际应用的角度来看,后续工作可将常用的符号和数字纳入识别系统,进一步完善该系统㊂参考文献:[1]㊀UBUL K,TURSUN G,AYSA A,et al.Script identification of multi-script documents:a survey[J].IEEE access,2017,5:6546-6559.[2]㊀彭勇,哈力旦㊃阿布都热依木,丁维超.基于改进单深层神经网络的自然场景中维吾尔文检测[J].计算机应用研究,2019,36(9):2876-2880.PENG Y,HALIDAN A,DING W C.Uyghur text detection in natural scene based on improved single deep neural network[J].Application research of computers,2019,36(9):2876-2880.[3]㊀艾力㊃居麦,哈力旦㊃A,黄浩.视频图像中维吾尔文字的识别研究[J].计算机工程与应用,2011,47(36):190-192.ELI J M,HALIDAN A,HUANG H.Recognition of extracting Uyghur texts from videos images[J].Computer engineering and applications,2011,47(36):190-192.[4]㊀买买提依明㊃哈斯木,吾守尔㊃斯拉木,维尼拉㊃木沙江,等.基于统计专用字符的维㊁哈㊁柯文文种识别研究[J].中文信息学报,2015,29(2):111-117.MAIMAITIYIMING H,WUSHOUER S,WEINILA M,et al.Unique character based statistical language identification for Uyghur,Kazak and Kyrgyz[J].Journal of Chinese information processing,2015,29(2):111-117.[5]㊀于丽,亚森㊃艾则孜.基于HOG特征和MLP分类器的印刷体维吾尔文识别方法[J].微型电脑应用,2017,33(6):30-33.YU L,YASEN A.A printed Uyghur recognition method based on HOG feature and MLP classifier[J].Microcomputer applica-tions,2017,33(6):30-33.[6]㊀许亚美.手写维吾尔文字识别若干关键技术研究[D].西安:西安电子科技大学,2014.XU Y M.A study of key techniques for Uighur handwriting recognition[D].Xiᶄan:Xidian University,2014.[7]㊀白云辉.印刷体维吾尔文单词识别[D].西安:西安电子科技大学,2014.BAI Y H.Printed Uyghur word recognition[D].Xiᶄan:Xidian University,2014.[8]㊀郎潇.基于切分的印刷体维吾尔文单词识别[D].西安:西安电子科技大学,2015.LANG X.Recognition of printed Uyghur words based on segmentation[D].Xiᶄan:Xidian University,2015.[9]㊀李旦旦.印刷体维吾尔文单词识别的分类器设计[D].西安:西安电子科技大学,2019.LI D D.Classifier design for printed Uyghur word recognition[D].Xiᶄan:Xidian University,2019.[10]SHI B G,BAI X,YAO C.An end-to-end trainable neural network for image-based sequence recognition and its application toscene text recognition[J].IEEE transaction on pattern analysis&machine intelligence,2017,39:2298-2304. [11]张建明,刘煊赫,吴宏林,等.面向小目标检测结合特征金字塔网络的SSD改进模型[J].郑州大学学报(理学版),2019,51(3):61-66,72.ZHANG J M,LIU X H,WU H L,et al.Improved SSD model with feature pyramid network for small object detection[J].Jour-nal of Zhengzhou university(natural science edition),2019,51(3):61-66,72.[12]佘颢,吴伶,单鲁泉.基于SSD网络模型改进的水稻害虫识别方法[J].郑州大学学报(理学版),2020,52(3):49-54.SHE H,WU L,SHAN L Q.Improved rice pest recognition based on SSD network model[J].Journal of Zhengzhou university (natural science edition),2020,52(3):49-54.[13]陈珂,梁斌,左敬龙,等.一种用于中文微博情感分析的多粒度门控卷积神经网络[J].郑州大学学报(理学版),2020,52(3):21-26,33.41郑州大学学报(理学版)第53卷CHEN K,LIANG B,ZUO J L,et al.Multiple grains-gated convolutional neural networks for Chinese microblog sentiment anal-ysis[J].Journal of Zhengzhou university(natural science edition),2020,52(3):21-26,33.[14]王知人,谷昊晟,任福全,等.基于深度卷积残差学习的图像超分辨[J].郑州大学学报(理学版),2020,52(3):42-48.WANG Z R,GU H S,REN F Q,et al.Residual learning of deep CNN for image super-resolution[J].Journal of Zhengzhou university(natural science edition),2020,52(3):42-48.[15]WANG Y S,LIAO W L,CHANG Y Q.Gated recurrent unit network-based short-term photovoltaic forecasting[J].Energies,2018,11(8):2163.[16]GRAVES A,MOHAMED A R,HINTON G.Speech recognition with deep recurrent neural networks[C]ʊIEEE InternationalConference on Acoustics,Speech and Signal Processing.New York:IEEE Press,2013:6645-6649.Design and Implementation of Printed Uyghur Recognition SystemBased on DjangoXIONG Lijian1,2,3,WUSHOR Silamu1,2,3,XU Miaomiao1,2,3(1.School of Information Science and Engineering,Xinjiang University,Urumqi830046,China;2.Xinjiang Multilingual Information Technology Laboratory,Urumqi830046,China;3.Xinjiang Multilingual Information Technology Research Center,Urumqi830046,China) Abstract:Optical character recognition(OCR)has been widely used in many fields such as book digiti-zation and document pared with the more mature Chinese and English printed recogni-tion system,there is still room for research and practical application of Uyghur printed recognition.Ai-ming at the problem of insufficient feature representation of traditional recognition methods,the rising deep learning technology was combined,the Python language programming was used,the improved conv-olutional recurrent neural network as the core of recognition algorithm was selected,and Django was used to design the system framework.The experimental results showed that the accuracy of the system was 95.7%and the average speed was12.5fps,which realized the end-to-end Uyghur whole word recogni-tion.Key words:convolutional recurrent neural network;gated recurrent unit;connectionist temporal classifi-cation;printed Uyghr(责任编辑:王浩毅㊀方惠敏)。

ABBYY FineReader Engine性能指南

ABBYY FineReader Engine性能指南

ABBYY FineReader EnginePerformance GuideIntegrating optical character recognition (OCR) technology will effectively extend the functionality of your application.Excellent performance of the OCR component is one of the key factors for high customer satisfaction.This document provides information on general OCR performance factors and the possibilities to optimize them in the Software Development Kit ABBYY FineReader Engine. By utilizing its advanced capabilities and options, the high OCR performance can be improved even further for optimal customer experience.When measuring OCR performance, there are two major parameters to consider:RECOGNITION ACCURACYPROCESSING SPEEDWhich Factors Influence the OCR Accuracyand Processing Speed?Image Type and Image QualityImages can come from different sources. Digitally createdPDFs, screenshots of computer and tablet devices, imagefiles created by scanners, fax servers, digital camerasor smartphones – various image sources will lead todifferent image types with different level of image quality.For example, using the wrong scanner settings can cause“noise” on the image, like random black dots or speckles,blurred and uneven letters, or skewed lines and shiftedtable borders. In terms of OCR, this is a ‘low-qualityimage’.Processing low-quality images requires high computingpower, increases the overall processing time and deterio-rates the recognition results.On the other hand, processing ‘high-quality images’ with-out distortions reduces the processing time. A dditionally,reading high-quality images leads to higher accuracyresults.Therefore, it is recommended to use high-quality imagesfor the OCR process.If it is not possible to influence the image quality in advance, it is recommended to enhance it prior to the recognition step. In FineReader Engine, various powerful image preprocessing functions are available:it is possible to use automatic language detection., a high number of preselected recognition languagesTo increase the recognition accuracy even more, FineReaderEngine provides dictionary and morphology support formany languages. When processing documents includingsubject-specific terms or …structures“ such as productcodes, telephone numbers or passport numbers, customcreated dictionaries can be imported to ensure high recog-nition quality.AAfrom a scanner or imported from the storage system or the memory stream. To obtain images from different sources will require different methods and influence the recognition speed. The image import from memory is generally faster than opening the images from a file storage.Image PreprocessingGenerally, the OCR process is faster for good-quality images. It is recommended to fine-tune the image preprocessing step accordingly and therefore savetime during the actual processing step.Images can be of different formats and quality. High-quality images, such as digitally created PDFs, typically do not require a lot of preprocessing work. For low-quality images, like scanned documents with incorrect scanner settings or old books, it is necessary to apply advanced imagepreprocessing functions to improve the recognition results. For the preprocessing of digital photos, the special ABBYYC amera OCR™ technology is applied. Here, the algorithms are optimized specifically for photo e nhancement. Usage of different preprocessing functions will individually influence the processing speed.The different methods and parameters used for specific processing scenarios will significantly influence the overall processing speed. Discarding unnecessary stages can speed up the entire OCR process. For example, when extracting data from predefined document areas, the document analysis is not required. When exporting the documents to TXT format or to PDF Image Only format, the synthesis stage can be skipped.Document AnalysisBrochures or newspapers often contain text in columns, tables, diagrams and pictures. Technical drawings might be large documents including complex engineering diagrams with different text orientation. For documents with such compli-cated layouts, the document analysis step will require more processing time. On the other hand, the analysis of simplelayout document like letters or contracts is very fast.Parallel Processing Using Multiple CoresFineReader Engine can be used to build applications of any scale and complexity – from a client workstation to a server- based solution or a large multi-million page project. When it comes to the OCR performance, it often makes sense to utilize multi-processor or multi-core systems to increase the processing speed.Built-in multi-core support in FineReader Engine allows different approaches to scale-up the OCR process:• Utilizing a single Engine instance• Loading several Engine instancesThere are different approaches for processing documents:Processing of Large Multi-Page DocumentsFor parallel processing of large documents with many pages the ‘FRDocument’ object is best suited. In this case, the pages of a multi-page document are processed in parallel on the CPUs available. At the end, the results are combined into one multi-page document. It is the most easy-to-code multiprocessing way. The number of processes needed is detected automatically, depending on factors such as the number of available physical or logical CPU cores, the number of free CPU cores available in the license, and the number of pages in the document. If nec-essary, the developer can easily change the multiprocess-ing settings and tune the number of processes to be run.Processing of Many Single-Page DocumentsTo process many one-page documents in parallel, which are received from the same source, e.g. a scanner, it is recommended to use the ‘BatchProcessor’ object. This object is most effective in terms of speed, when document export is not required, like in data capture scenarios with a custom output format.To perform full processing of many one-page documentsin parallel, it is recommended to use a pool of Engine in-stances. This approach is also best suited for web-service scenarios, when the input document should be processed directly after it was submitted. In this case, the document is passed to an available FineReader Engine instance from the pool and processed immediately.FineReader Engine - Speed Testing ResultsSystem ResourcesThe table presents the results of internal performance testing. Please be aware that testing results always depend on many factors, such as image quality, used recognition languages and other factors.During the OCR process, a range of different algorithms are applied. They depend on image quality, document languages, layout complexity and number of pages in the document. Accordingly, such algorithms might require higher memory resources. It is recommended to set up the system in accordance with the outlined memory requirements to optimize the processing speed by allocating adequate system memory.Technical Test InformationIntel® Core™ i5-4440 (3.10 GHz, 4 physical cores), 8 GB RAM, 4 processes running simultaneously.The performance was tested on 300 documents in E nglish, using the ‘DocumentArchiving_Speed’ predefined profile. In thescenarios …One-page d ocuments“ and …One multi-page d ocument“ the d ocuments were exported as PDF format.* The text was extracted from pre-defined areas on one-page documents. No export to any file format was performed.How to Increase the Overall Processing Speed in FineReader EngineHow to Improve the Text Recognition Quality in FineReader EngineThere are several possibilities to improve the performance of your system:• Fine-tune the image preprocessing settings to deliver the highest document quality for the processing step.• During the processing step, use one of the predefined processing profiles optimized for speed and the appropriate recognition mode – balanced or fast.• Specify the correct recognition languages. Incorrect language can significantly slow down document processing. The more recognition languages are selected, the slower the speed of processing. • Use the appropriate object (FRDocument or BatchProcessor) and enable parallel processing.• Specify appropriate parameters of analysis and recognition. For example, disable table detection and page orientation correction if images contain no tables and have correct page orientation.• Omit the synthesis stage if the processed documents will be exported to TXT format or PDF Image Only format.• Use the Fast PDF Export Profile, when exporting the documents to the PDF format.• Use the special object (ExportFileWriter), which is designed for the export of very large multi-page documents into PDF format.For more information, refer to the FineReader Engine Developer’s Help:FineReader Engine offers high recognition quality. The recognition quality will always depend on factors such as image quality, language and other factors. However, there are several ways to increase the recognition quality:• Specify the correct text type.• Specify the appropriate recognition languages.• Define unique languages and custom dictionaries for the recognition of special characters or documents with specific terminology, e.g. legal or healthcare texts.• Split the facing pages of scanned books into two separate images.• Apply the special Camera OCR technology, when digital photos are processed.• Correct resolution of the image, if it significantly differs from the recommended resolution.For more information refer to the FineReader Engine Developer’s Help:As you know, in document processing the OCR process can be a very complex task. Depending on the individual document processing scenario, the OCR performance results can significantly vary. The tips for recognition accuracy and processing speed optimisation in ABBYY FineReader Engine should help you to achieve the optimal performance for your business case.Additional Information ResourcesIndividual Project SupportTo learn more about the different aspects of OCR performance optimisation and about the SDK ABBYY FineReader Engine, please use following sources of information:• ABBYY Technology portal: https://abbyy.technology/• ABBYY OCR SDK forum: https:///• The help file provided in the ABBYY FineReader Engine distributive • ABBYY FineReader Engine product pages on /ocr-sdk .If you would like to discuss a particular project, please contact us. During the testing period, you can ask standard technical questions to our ABBYY Technical Support or use ABBYY Professional Services foradvanced consultancy, in-depth project analysis and individual code review. Using ABBYY’s technical resources can shorten your development work and speed up your project. Please contact the ABBYY sales manager, if you wish to further explore these options.If you have additional questions, contact your local ABBYY representative listed under /contacts or use the online contact form /ocr-sdk/#request-demo .This software includes ABBYY® FineReader® Engine 12 recognition technologies. © 2017, ABBYY Production LLC. ABBYY , FINEREADER and ABBYY FineReader are either registered trademarks or trademarks of ABBYY Software Ltd. All product names, trademarks and registered trademarks are property of their respective owners. Windows® is a registered trademark of Microsoft Corporation in the United States and other countries. The registered trademark Linux® is used pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide basis. Mac® and OS X® are trademarks of Apple Inc., registered in the U.S. and other countries. Datalogics®, The DL Logo®, PDF2IMG™ and DLE™ are trademarks of Datalogics, Inc. Adobe®, The Adobe Logo®, Adobe® PDF Library™, Powered by Adobe PDF Library logo, Reader® are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and/or other countries. Portions of this computer program are copyright © 1996-2007 LizardTech, Inc. The software is based in part on the work of the Independent JPEG Group. Portions of this software are copyright ©2011 University of New South Wales. Unicode support: © 1991-2013 Unicode, Inc. Intel® Performance Primitives: Copyright © 2002-2008 Intel Corporation. Portions of this software are copyright © 1996-2002, 2006 The FreeType Project . WIBU, CodeMeter, SmartShelter, and SmartBind are registered trademarks of Wibu-Systems. All rights reserved. All other trademarks are the property of their respective owners. #9612en。

微信小程序接入百度OCR(身份证识别)

微信小程序接入百度OCR(身份证识别)

微信⼩程序接⼊百度OCR(⾝份证识别)微信⼩程序接⼊百度OCR(⾝份证识别)1.接⼝描述⽀持对⼆代居民⾝份证正反⾯所有8个字段进⾏结构化识别,包括姓名、性别、民族、出⽣⽇期、住址、⾝份证号、签发机关、有效期限,识别准确率超过99%;同时⽀持⾝份证正⾯头像检测,并返回头像切⽚的base64编码及位置信息。

同时,⽀持对⽤户上传的⾝份证图⽚进⾏图像风险和质量检测,可识别图⽚是否为复印件或临时⾝份证,是否被翻拍或编辑,是否存在正反颠倒、模糊、⽋曝、过曝等质量问题。

请求⽰例HTTP ⽅法:POSTURL参数:参数值access_token通过API Key和Secret Key获取的access_token,参考“”Header如下:参数值Content-Type application/x-www-form-urlencodedBody中放置请求参数,参数详情如下:请求参数参数是否必选类型可选值范围说明image是string-图像数据,base64编码后进⾏urlencode,要求base64编码和urlencode后⼤⼩不超过4M,最短边⾄少15px,最长边最⼤4096px,⽀持jpg/jpeg/png/bmp格式id_card_side是string front/back front:⾝份证含照⽚的⼀⾯;back:⾝份证带国徽的⼀⾯detect_direction否string true/false是否检测图像旋转⾓度,默认检测,即:true。

朝向是指输⼊图像是正常⽅向、逆时针旋转90/180/270度。

可选值包括: - true:检测旋转⾓度; - false:不检测旋转⾓度。

detect_risk否string true/false是否开启⾝份证风险类型(⾝份证复印件、临时⾝份证、⾝份证翻拍、修改过的⾝份证)功能,默认不开启,即:false。

可选值:true-开启;false-不开启detect_photo否string true/false是否检测头像内容,默认不检测。

微信小程序中如何识别银行卡和身份证

微信小程序中如何识别银行卡和身份证

微信⼩程序中如何识别银⾏卡和⾝份证识别银⾏卡云函数card2/index.js:const cloud = require('wx-server-sdk')cloud.init({env: cloud.DYNAMIC_CURRENT_ENV,})exports.main = async (event, context) => {try {const result = await cloud.openapi.ocr.bankcard({ //识别银⾏卡"type": 'photo',"imgUrl": event.imgCard2})return result} catch (err) {return err}}识别⾝份证云函数card1/index.js:const cloud = require('wx-server-sdk')cloud.init({})exports.main = async (event, context) => {try {const result = await cloud.openapi.ocr.idcard({"type": 'photo',"imgUrl": event.imgCard})console.log(result)return result} catch (err) {return err}}shibie.wxml:<button bindtap="shibie2">识别银⾏卡</button><text>银⾏卡号是:{{number}}</text><button bindtap="shibie">识别⾝份证</button><view>姓名是:{{name}}</view><view>⾝份证号是:{{id}}</view><view>性别是:{{gender}}</view>shibie.jsPage({shibie2(){ //识别银⾏卡var that=thiswx.cloud.callFunction({name:"card2",data:{imgCard2:"https:///%E9%93%B6%E8%A1%8C%E5%8D%A1.jpg?sign=71270da3612790663bf818d02ee3f994&t=1624794493" },success(res){console.log("识别成功",res)that.setData({number:res.result.number})},fail(res){console.log("识别失败",res)},})},shibie(){ //识别⾝份证var that=thiswx.cloud.callFunction({name:"card1",data:{imgCard:"https:///%E8%BA%AB%E4%BB%BD%E8%AF%81.jpg?sign=2fa017e88a2bd0e96f18a0655c8034a6&t=1624794751" },success(res){console.log("识别成功",res)that.setData({name:,id:res.result.id,gender:res.result.gender})},fail(res){console.log("识别失败",res)},})},})。

相关主题
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Getting lots of data: Artificial data synthesis
Machine Learning
Character recognition
AIN Q来自T AAndrew Ng
Artificial data synthesis for photo OCR
Abcdefg Abcdefg Abcdefg Abcdefg
Andrew Ng
Another ceiling analysis example Face recognition from images (Artificial example)
Camera image Preprocess (remove background)
Eyes segmentation
Andrew Ng
Text detection
Andrew Ng
Text detection
Positive examples
Negative examples
Andrew Ng
Text detection
[David Wu]
Andrew Ng
1D Sliding window for character segmentation
Andrew Ng
Application example: Photo OCR
Ceiling analysis: What part of the pipeline to work on next
Machine Learning
Estimating the errors due to each component (ceiling analysis)
3. Character classification
A N T
Andrew Ng
Photo OCR pipeline
Image
Text detection
Character segmentation
Character recognition
Application example: Photo OCR
Component
Label
Accuracy 85% 85.1% 91% 95% 96% 97% 100%
Andrew Ng
Mouth segmentation
Overall system Preprocess (remove background) Face detection Eyes segmentation Nose segmentation Mouth segmentation Logistic regression
[Adam Coates and Tao Wang]
Andrew Ng
Synthesizing data by introducing distortions: Speech recognition Original audio:
Audio on bad cellphone connection Noisy background: Crowd
Real data
[Adam Coates and Tao Wang]
Andrew Ng
Abcdefg
Artificial data synthesis for photo OCR
Real data
[Adam Coates and Tao Wang]
Synthetic data
Andrew Ng
Synthesizing data by introducing distortions
[Adam Coates and Tao Wang]
Andrew Ng
Discussion on getting more data 1. Make sure you have a low bias classifier before expending the effort. (Plot learning curves). E.g. keep increasing the number of features/number of hidden units in neural network until you have a low bias classifier. 2. “How much work would it be to get 10x as much data as we currently have?” - Artificial data synthesis - Collect/label it yourself - “Crowd source” (E.g. Amazon Mechanical Turk)
Image Text detection Character segmentation Character recognition
What part of the pipeline should you spend the most time trying to improve?
Component Overall system Text detection Character segmentation Character recognition Accuracy 72% 89% 90% 100%
Application example: Photo OCR
Problem description and pipeline
Machine Learning
The Photo OCR problem
Andrew Ng
Photo OCR pipeline 1. Text detection
2. Character segmentation
Face detection
Nose segmentation Mouth segmentation
Logistic regression
Label
Andrew Ng
Another ceiling analysis example
Camera image Preprocess (remove background) Eyes segmentation Face detection Nose segmentation Logistic regression
Positive examples
Negative examples
Andrew Ng
Sliding window detection
Andrew Ng
Sliding window detection
Andrew Ng
Sliding window detection
Andrew Ng
Sliding window detection
Andrew Ng
Discussion on getting more data 1. Make sure you have a low bias classifier before expending the effort. (Plot learning curves). E.g. keep increasing the number of features/number of hidden units in neural network until you have a low bias classifier. 2. “How much work would it be to get 10x as much data as we currently have?” - Artificial data synthesis - Collect/label it yourself - “Crowd source” (E.g. Amazon Mechanical Turk)
Sliding windows
Machine Learning
Text detection
Pedestrian detection
Andrew Ng
Supervised learning for pedestrian detection pixels in 82x36 image patches
Noisy background: Machinery
[]
Andrew Ng
Synthesizing data by introducing distortions Distortion introduced should be representation of the type of noise/distortions in the test set. Audio: Background noise, bad cellphone connection Usually does not help to add purely random/meaningless noise to your data. intensity (brightness) of pixel random noise
Positive examples
Negative examples
Andrew Ng
Photo OCR pipeline 1. Text detection
2. Character segmentation
3. Character classification
A N T
Andrew Ng
Application example: Photo OCR
相关文档
最新文档