数据分析实验
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
WEKA 数据分析实验
1.实验简介
借助工具Weka 3.6,对数据样本进行测试,分类测试方法包括:朴素贝叶斯、决策树、随机数三类,聚类测试方法包括:DBScan,K均值两种;
2.数据样本
以熟悉数据分类的各类常用算法,以及了解Weka的使用方法为目的,本次试验中,采用的数据样本是Weka软件自带的“Vote”样本,如图:
3.关联规则分析
1)操作步骤:
a)点击“Explorer”按钮,弹出“Weka Explorer”控制界面
b)选择“Associate”选项卡;
c)点击“Choose”按钮,选择“Apriori”规则
d)点击参数文本框框,在参数选项卡设置参数如:
e)点击左侧“Start”按钮
2)执行结果:
=== Run information ===
Scheme: weka.associations.Apriori -I -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.5 -S -1.0 -c -1 Relation: vote
Instances: 435
Attributes: 17
handicapped-infants
water-project-cost-sharing
adoption-of-the-budget-resolution
physician-fee-freeze
el-salvador-aid
religious-groups-in-schools
anti-satellite-test-ban
aid-to-nicaraguan-contras
mx-missile
immigration
synfuels-corporation-cutback
education-spending
superfund-right-to-sue
crime
duty-free-exports
export-administration-act-south-africa
Class
=== Associator model (full training set) ===
Apriori
=======
Minimum support: 0.5 (218 instances)
Minimum metric
Number of cycles performed: 10
Generated sets of large itemsets:
Size of set of large itemsetsL(1): 12
Large ItemsetsL(1):
handicapped-infants=n 236
adoption-of-the-budget-resolution=y 253
physician-fee-freeze=n 247
religious-groups-in-schools=y 272
anti-satellite-test-ban=y 239
aid-to-nicaraguan-contras=y 242
synfuels-corporation-cutback=n 264
education-spending=n 233
crime=y 248
duty-free-exports=n 233
export-administration-act-south-africa=y 269
Class=democrat 267
Size of set of large itemsetsL(2): 4
Large ItemsetsL(2):
adoption-of-the-budget-resolution=y physician-fee-freeze=n 219
adoption-of-the-budget-resolution=y Class=democrat 231
physician-fee-freeze=n Class=democrat 245
aid-to-nicaraguan-contras=y Class=democrat 218
Size of set of large itemsetsL(3): 1
Large ItemsetsL(3):
adoption-of-the-budget-resolution=y physician-fee-freeze=n Class=democrat 219
Best rules found:
1. adoption-of-the-budget-resolution=y physician-fee-freeze=n 219 ==> Class=democrat 219 conf:(1)
2. physician-fee-freeze=n 247 ==> Class=democrat 245 conf:(0.99)
3. adoption-of-the-budget-resolution=y Class=democrat 231 ==> physician-fee-freeze=n 219 conf:(0.95)
4. Class=democrat 267 ==> physician-fee-freeze=n 245 conf:(0.92)
5. adoption-of-the-budget-resolution=y 253 ==> Class=democrat 231 conf:(0.91)
6. aid-to-nicaraguan-contras=y 242 ==> Class=democrat 218 conf:(0.9)
3)结果分析:
a)该样本数据,数据记录数435个,17个属性,进行了10轮测试
b)最小支持度为0.5,即至少需要218个实例;
c)最小置信度为0.9;
d)进行了10轮搜索,频繁1项集12个,频繁2项集4个,频繁3项集1个;
4.分类算法-随机树分析
1)操作步骤:
a)点击“Explorer”按钮,弹出“Weka Explorer”控制界面
b)选择“Classify ”选项卡;
c)点击“Choose”按钮,选择“trees” “RandomTree”规则