Literature Review 英文文献综述模板

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Text Recognition with Machine Learning based on Text Structure

Literature Review

Yifan Shi Student ID:27291944

Email:ys1n13@

MSc Artificial Intelligence

Faculty of Physical Sciences&Eng,University of Southampton

Abstract—The fast developing Machine Learning algorithms introduced to semantic area nowadays has brought vast techniques in text recognition,classification, and processing.However,there is always a contradiction between accuracy and speed,as higher accuracy generally represents more complicated system as well as large training database.In order to achieve a balance between fast speed and good accuracy,many brilliant designs are used in text processing.In this literature review,these efforts are introduced in three layers:Natural-Language Processing,Text Classification,and IBM Watson System.

Keywords—Machine Learning,Natural-Language Processing,Text Classification,IBM Watson

I.I NTRODUCTION

The growing popularity of the Internet has brought increasing number of users online,with a vast amount of messages,blogs,articles,etc.to be dealt with.These texts,known as natural-language texts,contain possible useful information but take a long time for human to read,understand and deal with.Despite the popular search engine technology nowadays in helping users tofind the sources with keywords,semantic techniques are also needed by many companies to improve their user-friendly working environment.In this literature review,I will introduce several important semantic techniques,starting from the most basic Natural-Language Processing,concentrating in the meaning of words and sentences,followed by Text Classification which is focused on paragraphs and articles.Then,I will introduce a landmark system named IBM Watson,which has DeepQA as its working pipeline.Finally,a conclusion will be included to give some comments on these techniques.

II.N ATURAL L ANGUAGE P ROCESSING In order to deal with the human natural-language, it is necessary to transform the unstructured text into well-structured tables of explicit semantics (Ferrucci,2012).According to Liddy(2001), Natural-Language Processing(NLP)is a series of computational techniques used to analyze and represent naturally organized text in order to achieve certain tasks and applications.Collobert and Weston(2008)have categorized NLP tasks into six types:Part-Of-Speech Tagging,Chunking,Named Entity Recognition,Semantic Role Labeling, Language Models,and Semantically Related Words.In addition to this,they also implemented Multitask Learning with Deep Neural Networks to build a successful unified architecture which avoided traditional large amount of empirical hand-designed features to train the system by using backpropagation training(Collobert et al.,2011).

III.T EXT C LASSIFICATION

One of the simple way to represent an article for a learning algorithm is to use the number of times that distinct words appear in the document (Joachims,2005).However,due to the large amount of possible words used in articles,it would create a very high dimensional space of features.Joachims(1999)suggests a Transductive

相关文档
最新文档