面向情感分析的特征抽取技术研究
合集下载
相关主题
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Classified Index: TP391.1 U.D.C.: 681.37
Dissertation for the Master Degree in Engineering RESEARCH ON FEATURE EXTRACTION TECHNOLOGY FOR SENTIMENT ANALYSIS
- III -
哈尔滨工业大学硕士学位论文
目录
摘 要 ............................................................................................................... I Abstract ................................................................................................................ II 第 1 章 绪论 .........................................................................................................1 1.1 课题背景和意义 ........................................................................................1 1.2 国内外研究现状 ........................................................................................2 1.2.1 词语的极性判别 .................................................................................2 1.2.2 特征抽取技术 .....................................................................................4 1.2.3 典型的意见挖掘系统 .........................................................................6 1.2.4 中文领域的意见挖掘研究 ..................................................................9 1.3 本课题的主要研究内容 ..........................................................................10 第 2 章 基于关联规则的特征抽取技术 ............................................................12 2.1 产品特征的定义和抽取思想 ...................................................................12 2.2 中文文本的词性标注 ..............................................................................14 2.3 利用关联规则挖掘特征候选 ...................................................................15 2.3.1 关联规则挖掘的基本思想 ................................................................15 2.3.2 Apriori算法 ........................................................................................16 2.4 特征的过滤和排序 ..................................................................................18 2.4.1 领域相关度过滤 ...............................................................................19 2.4.2 非短语过滤 .......................................................................................20 2.5 实验及结果分析 ......................................................................................22 2.6 本章小结 ..................................................................................................25 第 3 章 基于产品特征的情感分析技术 ............................................................26 3.1 极性词的上下文极性分析 .......................................................................27 3.2 产品特征与极性词的关联分析 ...............................................................28 3.2.1 评价对象获取 ...................................................................................29 3.2.2 特征与极性词的配对分析 ................................................................30 3.3 实验及结果分析 ......................................................................................32 3.3.1 极性词典建设 ...................................................................................32 3.3.2 其他资源准备 ...................................................................................33 3.3.3 评价方法及实验结果 .......................................................................34
Candidate: Supervisor: Academic Degree Applied for: Specialty: Unit: Date of Oral Examination: University:
Zhu Shanzong Associate Prof. Liu Yuanchao Master of Engineering Computer Science and Technology Department of Computer Science and Technology June, 2009 Harbin Institute of Technology
-I-
哈尔滨工业大学硕士学位论文
Abstract
The Web contains a wealth of reviews about products, which are expressed in online forum, BBS and virtual community. Since these reviews are haphazard, the problem of mining opinion from review texts gets more and more researchers’ attention recently. Mining opinion from online review can not only provide advice for potential purchasers, but can also help businessmen track market feedback from product users. In this paper, we purpose to improve feature extraction algorithm and opinion analysis algorithm for Chinese language application, and implement a prototype system to analyze online reviews for products. Based on analyzing and summarizing the findings, algorithms and ideas of existing research in opinion mining domain, feature extraction algorithm based on association rule and opinion analysis algorithm based on syntactic analysis are proposed for Chinese language application in this paper. A prototype system for online reviews analysis is implemented, though which we could find out and solve the problems that we couldn’t realize before applying. The research works and innovations in this paper are mainly as follows: Firstly, knowing that product features are review topics in context and are domain-dependency just like domain terms, an association rule based method is proposed for extracting product features from review database. This method has been proved feasible and effective in English language application, and now it is used in Chinese language in this paper. Secondly, several feature filtering algorithms are proposed. Since product features are domain-dependency, a dependency filtering algorithm is proposed, which is used to filter inaccurate single noun. Since product features always appear as phrases in context, a non-phrase filtering algorithm is proposed to filter those noun items that couldn’t be used as noun phrases. Thirdly, as we can see that subjective sentiment and syntactic statements would be so complicated in review sentences, a method based on syntactic parser is proposed. We first use syntactic parser to parse the structure of sentence, and then get the dependency relation between polar word and its modified adverbs,
哈尔滨工业大学硕士学位论文
摘ቤተ መጻሕፍቲ ባይዱ
要
网络上各种论坛、 BBS、虚拟社区有着丰富而又繁杂的用户评论,如何 从这些评论文本中挖掘对于产品性能的意见信息,越来越受到国内外研究者 的关注。从网络评论中挖掘评价意见,不仅可以为潜在的产品购买者提供参 考意见,还可以方便商家跟踪产品使用者的反馈。本文研究的目的是改进特 征抽取算法和意见挖掘算法,使之适用于中文处理,并最终实现一个产品的 网络评价分析原型系统。 本文在对意见挖掘方向现有的研究成果、算法、思想进行分析和总结的 基础上,结合中文语言本身的特点,提出了基于关联规则的产品特征抽取算 法和基于句法分析的意见分析算法,并设计一个基于 Google API 的网络评 论分析系统,通过实践分析和总结这两个算法在应用中可能存在的问题。 本文的主要研究工作和创新点如下: 首先,针对产品特征在用户评论中表现为评论对象,并且与领域术语一 样具有领域相关的特点,应用关联规则的方法从评论数据库中自动抽取产品 特征。这种方法在英文语言的处理中,已经被证明是可行的和有效的,本文 通过改进之后用于中文处理。 其次,针对产品特征本身的特点,本文应用了多种特征过滤算法。根据 产品特征与领域相关的特点,设计领域相关度过滤算法,可过滤不准确的单 名词;根据产品特征在文本中以词组的形式出现的特点,设计非短语过滤算 法,剔除特征候选中不能构成名词短语的名词模式。 再次,针对评论语句中主观情感表达和句法表达复杂的情况,提出利用 句法分析器剖析句子结构,以识别极性词与修饰副词的依存关系,以及极性 词与产品特征的依存关系。基于这个方法,本文设计了极性词的上下文极性 分析算法和极性词与产品特征配对分析算法,并将算法用于分析评论句子的 意见极性和强度。 最后,本文设计了一个基于 Google API 自动分析产品的网络评价的原 型系统,通过限定查询式中的关键词准确找到相关的意见型主观文本。本文 通过原型系统构建的实践,分析特征抽取算法和意见分析算法在具体应用中 出现的问题,发现算法的不足之处和改进方向。 关键词 网络评论;关联规则;句法分析;特征抽取;情感分析
工学硕士学位论文
面向情感分析的特征抽取技术研究
朱善宗
哈尔滨工业大学
2009 年 6 月
国内图书分类号: TP391.1 国际图书分类号 : 681.37
工学硕士学位论文
面向情感分析的特征抽取技术研究
硕 士 研 究 生: 朱善宗 导 师: 刘远超 副教授 申 请 学 位: 工学硕士 学 科 、 专 业: 计算机科学与技术 所 在 单 位: 计算机科学与技术学院 答 辩 日 期: 2009 年 6 月 授 予 学 位 单 位: 哈尔滨工业大学
- II -
哈尔滨工业大学硕士学位论文
and get the dependency relation between polar word and product feature. After that, context polarity analysis algorithm and feature-sentiment pair identification algorithm are proposed in this paper, which are used to analyze the polarity and strength of opinion in review sentence. Finally, a prototype system based on Google API is designed for analyzing online reviews for product, and the system can find relevant subjective review text accurately by setting keywords in query to Google server. The system is an implementation of feature extraction algorithm and opinion analysis algorithm, and though this practice we could evaluate the appearance of the two algorithms in real application. Keywords: online review, association rule, syntactic analysis, feature extraction, sentiment analysis