System of KDD Tasks and Results within the STULONG Project

合集下载

工作分析和工作计划英文版

工作分析和工作计划英文版
Introduction
Definition of Work Analysis
Work Analysis is a process of studying the nature, characteristics, and requirements of work tasks.
It involves breaking down work into its constituent elements and analyzing them to understand their relationships and dependencies.
Identifies the human, technical, and material resources required for project execution.
Identifies potential risks and how they will be mitigated or managed.
Case Study 2: Work Plan in a Software Development Project
总结词
需求分析、时间管理、团队协作
详细描述
在软件开发项目中,制定详细的工作计划至关重要。首先,进行需求分析,明确软件的 功能和性能要求,为后续开发提供依据。其次,做好时间管理,根据项目复杂度和团队 能力,合理安排开发进度,确保项目按时交付。此外,加强团队协作,通过有效的沟通
Analyze work: Break down the project into smaller, manageable tasks and analyze the effort required for each task.
Prioritize tasks:

多轴车辆第三轴电控液压转向系统及其PID控制

多轴车辆第三轴电控液压转向系统及其PID控制

多轴车辆第三轴电控液压转向系统及其P I D 控制钱立军 胡伟龙 邱利宏 刘少君合肥工业大学,合肥,230009摘要:为了改善多轴车辆后轴轮胎的磨损,设计了一种第三轴电控液压转向系统㊂重点研究了该系统的液压执行机构和对中自锁油缸的工作原理,拟合出了符合阿克曼转角定理的第三轴预期转角,建立了电控液压转向系统的模型,设计了分数阶P I D 控制器并提出了该分数阶P I D 控制器参数的选取方法,最后进行了仿真分析㊁台架试验㊁实车试验㊂拟合结果表明,第三轴预期转角在车速为10m /s 和20m /s 时,期望值和实际值的残差平方都在0.16以内,拟合度都在0.985以上㊂仿真分析结果表明,分数阶P I D 控制系统比整数P I D 控制系统具有更小的超调量和更短的调节时间㊂台架试验结果表明,第三轴预期转角在车速为10m /s 和20m /s 时,期望值和实际值的误差都在±0.3°以内㊂由实车试验可以定性看出,安装该第三轴电控液压转向系统比不安装该系统在空载和满载时轮胎磨损情况都有所改善㊂关键词:多轴车辆;转向系统;液压执行机构;期望转角;分数阶P I D 控制器中图分类号:U 463.4 D O I :10.3969/j.i s s n .1004‐132X.2015.22.005E l e c t r o n i cH y d r a u l i c S t e e r i n g S ys t e ma n d I t s P I DC o n t r o l l e r A p p l i e d t oT h i r dA x l e o fM u l t i ‐a x l eV e h i c l e s Q i a nL i j u n H u W e i l o n g Q i uL i h o n g L i uS h a o ju n H e f e iU n i v e r s i t y o fT e c h n o l o g y ,H e f e i ,230009A b s t r a c t :T h e e l e c t r o n i c h y d r a u l i c s t e e r i n g s y s t e m o f a t h i r da x i s a i m e da t r e d u c i n g th ew e a ro f t h e r e a r t i r e so f t h e m u l t i ‐a x i sv e h i c l e s .T h ea c t u a t o ro fe l e c t r i ch y d r a u l i cs t e e r i n g s y s t e m a n dt h e w o r k i n gp r i n c i p l e s o f t h e c e n t e r i n g a n d s e l f ‐l o c k i n g c y l i n d e r o f h y d r a u l i c s y s t e m w e r e f o c u s e d .A n e x -p e c t e d a n g l e o f t h e t h i r d a x l ew a s f i t t e db a s e d o n t h eA c k e r m a n a n g l e t h e o r e m.A m o d e l o f e l e c t r o n i c h y d r a u l i c s t e e r i n g s y s t e m w a s b u i l t .Af r a c t i o n a l o r d e rP I Dc o n t r o l l e r a n d t h e a l go r i t h mo f f r a c t i o n a l o r d e rP I Dc o n t r o l l e r p a r a m e t e r sw e r e p u t f o r w a r d .A t l a s t t h e s i m u l a t i o n a n a l ys e s ,a b e n c h t e s t a n d a v e h i c l e t e s tw e r ec o n d u c t e d .T h e f i t t i n g r e s u l t s s h o wt h a t t h e r e s i d u a l s q u a r eb e t w e e nt h ee x p e c t e d v a l u e s a n d t h e r e a l v a l u e s o f t h e a n g l e s o f t h e t h i r d a x l e a r ew i t h i n 0.16a n d t h e f i t t i n g d e gr e e i s a b o v e 0.985w h e n t h e s p e e d o f v e h i c l e i s a s 10m /s o r 20m /s .A n d t h e s i m u l a t i o n r e s u l t s s h o wt h a t t h e s y s -t e mc o n t r o l l e db y f r a c t i o n a l o r d e rP I Dc o n t r o l l e r h a s s m a l l e r o v e r t i m e a n d s h o r t e r a d j u s t i n g t i m e t h a n t h e o n e o f i n t e g r a l o r d e rP I Dc o n t r o l l e r .T h eb e n c ht e s t r e s u l t s s h o wt h ee r r o r sb e t w e e nt h ee x pe c t v a l u e s a n d t h e r e a l v a l u e s of t h e t h i r d a x l e a r ew i t h i n (±0.3°)w h e n t h e s pe e dof v e h i c l e i s a s 10m /s o r 20m /s .V e h i c l e e x pe r i m e n t a l r e s u l t s s h o wt h a t t i r ew e a r a r e r e d u c e dn om a t t e r i n t h e c a s eof n o ‐l o a d e do r f u l l ‐l o a d e da f t e r t h ef r a c t i o n a lo r d e rP I Dc o n t r o l e l e c t r i ch y d r a u l i cs t e e r i ng s y s t e mi s i n -s t a l l e d .K e y w o r d s :m u l t i ‐a x l ev e h i c l e ;s t e e r i n g s y s t e m ;h y d r a u l i c a c t u a t o r ;e x p e c t e da n g l e ;f r a c t i o n a l o r -d e rP I Dc o n t r o l l e r 收稿日期:20150112基金项目:工业信息化部电子信息产业发展基金资助项目(财[2009]453号);中航工业产学研合作创新工程专项资助项目(C X Y 2010H F G D 26)0 引言传统多轴车辆第三轴转向一般使用杆系结构等机械式助力转向系统跟随第一轴转向,这一方法不满足A c k e r m a n 几何关系,导致后轴轮胎易磨损[1]㊂这就需要在第三轴使用电控液压转向系统控制第三轴的转角㊂国内外的研究多集中在理论研究阶段㊂文献[2‐3]建立了多轴车辆转向系统的A D AM S 模型;文献[4‐5]基于零质心侧偏角分析并得出了不同轴间的转向关系;文献[6‐7]对电液系统建模后,提出了基于模糊自适应的P I D 控制策略,并利用MA T L A B 进行了仿真分析㊂本文设计了一种电控液压转向系统,使得控制器可以根据第一轴的转角控制车辆第三轴的转㊃8003㊃中国机械工程第26卷第22期2015年11月下半月Copyright ©博看网. All Rights Reserved.动,并设计了对中自锁的液压油缸,使得载重汽车在长时间直线行驶的工况下第三轴不发生偏转㊂本文还对第三轴预期转角进行分段模拟,并建立了电控液压转向系统模型,设计了分数阶P I D控制器,提出一种该分数阶P I D控制器参数的选取方法并进行了仿真分析和试验验证㊂1 电控液压转向系统设计本文研究的电控液压转向系统应用于某型号的8×2载重车辆的第三轴转向控制㊂车辆第一和第二轴转向通过机械液压助力转向系统实现,第四轴是不转向的驱动轴㊂电控液压转向系统由电液比例阀㊁辅助装置㊁控制器㊁角度传感器和速度传感器组成㊂角度传感器安装在第一轴和第三轴的转向主销顶端,液压油缸安装在第三转向轴的转向横拉杆上㊂通过角度传感器测得第一轴和第三轴的当前转角,通过控制器得出第三轴期望转角,控制电液比例阀推动对中自锁液压油缸,使第三轴转动㊂1.1 液压执行机构设计液压执行机构的原理如图1所示㊂控制器控制各阀流量和开关㊂转向系统中,对中自锁油缸的左端和车桥固连,右端通过A1面的连杆和第三转向轴的转向横拉杆固连㊂整个缸筒浮动安装㊂当对中自锁油缸左边的活塞移动到液压左缸的右端面(A2面)且右边的活塞移动到液压右缸的右端面(A1面)时,第三轴轮胎直线行驶㊂图1 液压执行机构原理图正常状态下,阀3打开,阀1㊁阀2关闭,控制器根据比例换向阀4和比例换向阀5控制对中自锁油缸的运动,实现第三轴转向功能㊂系统检测到故障时,将阀1㊁阀2打开,将阀3关闭,阀4㊁阀5回到中位㊂阀1使对中自锁油缸的左缸移动到A2面,阀2使对中自锁油缸的右缸移动到A1面㊂这时,第三轴转角不随第一轴转角变化而发生变化㊂1.2 对中自锁油缸载重汽车直线行驶的工况比较多,这种工况下第三轴车轮一般处于不偏转状态,即对中自锁油缸对中工况运作时间较长㊂转向系统中采用的对中油缸通常通过高压油液实现对中锁死,缺点是锁死不牢靠,受地面冲击时稳定性差㊂为了克服该缺点,设计了一种能够在中位实现可靠机械自锁的对中自锁液压油缸,其结构见图2㊂自锁卡板与拉伸弹簧相连,通过控制拉伸弹簧就可以控制液压油缸处于自锁或解锁状态㊂在油缸自锁过程中,油液驱动活塞运动,活塞端的环形凸起将自锁卡板推出,使得自锁卡板嵌入到环槽内实现机械自锁㊂在油缸解锁过程中,油液反方向流入,油液推动活塞向解锁一侧运动,在拉伸弹簧拉力的作用下,自锁卡板回位到安装基体之中,从而实现解锁㊂正常情况下自锁卡板嵌入安装基体内,不伸出来,因而液压油缸不会被锁住㊂弹簧卡板总成的结构见图3㊂图2 对中自锁油缸的装配图图3 弹簧卡板总成的结构图2 第三轴预期转角在不考虑转向系统刚度且车轮保持纯滚动的情况下,根据阿克曼原理,同一轴上的外转向轮转角与内转向轮转角之间的关系如下:c o tαi=c o tβi+B l i(1)式中,αi为第i轴外转向轮的转角;βi为第i轴内转向轮的转角;B为轮距;l i为第i轴车轮中心到瞬时转动中心的距离㊂㊃9003㊃多轴车辆第三轴电控液压转向系统及其P I D控制 钱立军 胡伟龙 邱利宏等Copyright©博看网. All Rights Reserved.根据阿克曼原理,不同轴外转向轮转角之间还应满足关系:αi α1≈t a n αi t a n α1=1i l 1=L i -Δl 1(2)式中,L i 为第i 轴车轮中心到车辆质心的距离;Δ为车轮瞬时转动中心到质心的距离㊂根据文献[8]中零质心侧偏角控制策略,有Δ=m v 2∑ni =1C iL 2i ∑n i =1C i ∑ni =1C i L 2i-(∑ni =1C i L i )2+mv 2∑ni =1C i L i(3)式中,m 为车辆质量;v 为车辆行驶速度;C i 为第i 轴综合侧偏刚度㊂车辆的部分参数如表1所示㊂表1 车辆部分参数m (k g)L 1(m )L 2(m )L 3(m )L 4(m )540002.200.600.521.88C 1(k N /r a d )C 2(k N /r a d )C 3(k N /r a d )C 4(k N /r a d )440440474474根据阿克曼转角公式,在不同速度下,对应每一个第一轴车轮的转角,可求出相应的第三轴车轮目标转角,并且进行分段直线拟合,结果如下:当车速为10m /s 时,设第三轴预期转角为y ,第一轴转角为x ,有y =0.3004x +2.0904° x <-21°0.1880x -21°≤x ≤18°0.2972x -1.8914°x >{18°(4)当车速为20m /s 时,设第三轴预期转角为y ,第一轴预期转角为x ,有y =0.3516x +2.0770° x <-21°0.2455x -21°≤x ≤18°0.35667x -2.0230°x >{18°(5)第一转向轴的转动范围为[-45°,45°],在该范围,车辆速度v 分别为10m /s 和20m /s 时,测量相应的第一轴车轮实际角度和第三轴车轮实际角度,再计算出相应的第三轴车轮期望角度,结果如图4所示㊂图4 第三轴的实际转角和期望转角计算期望值与实际值残差平方和拟合度,结果如下:v =10m /s 时,残差平方为0.1536,拟合度为0.9886;v =20m /s 时,残差平方为0.1598,拟合度为0.9859㊂这说明拟合的三段直线比较符合实际情况㊂3 电控液压转向系统模型电控液压转向系统模型主要包括转向系统的模型和电液比例阀的模型两部分㊂电液比例阀采用带位移反馈式的比例阀,根据其运动学特性可得出传递函数为[9]x (s )I (s )=k 1k 2s 2ω2n +2ξn s ω2n+1(6)式中,x (s )为阀芯位移;I (s )为比例阀电流;K 1为比例放大器增益;K 2为比例阀放大系数;ωn 为控制阀的固有频率;ξn 为液压相对阻尼系数㊂阀芯位移x (s)就是第三轴横向拉杆的位移x v (s )㊂根据转向系统模型,第三轴横向拉杆的位移x v (s )与第三轴转角θ(s )之间的传递函数为θ(s )x v (s )=A ρs 2ω2h+2ξh s ω2h +1(7)式中,A 为活塞面积;ρ为系统系数;ωh 为液压无阻力固有频率;ξh 为液压阻尼比㊂由式(6)和式(7)可以得到第三轴转角θ(s)与比例阀电流I (s)之间的传递函数:G c (s )=θ(s )x v (s )x (s )I (s )=k 1k 2A ρ(s 2ω2n +2ξn s ω2n +1)(s 2ω2h +2ξh s ω2h+1)(8)一般情况下ωn 远大于ωh ,因此第三轴转角θ(s )与比例阀电流I (s )之间的传递函数G c (s)可做如下近似:G c (s)≈k 1k 2A ρs 2ω2n +2ξn s ω2n+1(9)代入车辆相关参数可以得到G c (s)≈15600s 2+73s +6089(10)4 第三轴电液助力转向系统分数阶P I D控制器及其参数的求解分数阶微积分的介绍和分数阶微积分的求解方法在文献[10‐14]有详细论述,这里分数阶微积分的定义采用C a pu t o 定义,求解分数阶微积分,借助于M A T L A B ,根据分数阶定义,使用截断M a -c L a u r i n 表达式展开,S 表达式选用S i m p s o n 公式法,编制分数阶微积分求解模块,在M A T L A B 中构造一种求解分数阶M a c L a u r i n 展开式的函数f r a c t i o n C ,并在S i m u l i n k 中构造分数阶模块㊂㊃0103㊃中国机械工程第26卷第22期2015年11月下半月Copyright ©博看网. All Rights Reserved.4.1 分数阶P I D 的求解方法相对于整数P I D 控制器,分数阶P I D 控制器包含积分阶数λ和微分阶数μ,控制更加精确㊁灵活㊂分数阶P I D 控制器的微分方程形式如下:u (t )=k P e (t )+k I D -λI e (t )+k D D μte (t )(11)其中,D -λI 表示λ阶积分,D μt 表示μ阶微分㊂通过对式(11)进行拉氏变换,得到分数阶P I D 的传递函数[15‐17]:G f o c (s )=k P +k I s -λ+k Ds μ(12)对于闭环系统,存在特征方程:1+G c (s )G f o c (s )=0(13)寻找一个系统的幅值裕量A m 和相位裕量φm ,满足下式A m =1|G f o c (jωp )G c (j ωp )|φm =ar g (G f o c (j ωg )G c (j ωg ))+}π(14)其中,ωp ㊁ωg 满足下式|G f o c (j ωg )G c (j ωg )|=1a r g (G f o c (j ωp )G c (j ωp ))=}π(15)4.2 使用MA T L A B 求解分数阶P I D 的参数目标幅值裕量A m 取1.5,目标相位裕量取π/2,将式(10)代入式(13)~式(15)中有k P +k I ωp λc o s πλ2+k D ωμpc o s πμ2= 23(115600ω2p -608915600)-k I ωλps i n πλ2+k D ωμp s i n πμ2=-237315600ωp k P +k Iωλgc o s πλ2+k D ωμg c o s πλ2=7315600ωg -k I ωλgs i n πλ2+k D ωμg s i n πμ2=115600ω2g -6089üþýïïïïïïïïïïïï15600(16)式(16)中有k P ㊁k I ㊁k D ㊁λ㊁μ㊁ωp ㊁ωg 7个参数㊁4个等式,我们将λ㊁μ均从0.1开始按0.1递增取值,一直到5.0㊂优化目标为时间乘以误差绝对值积分(I T A E )J I T A E :f (x )=J I T A E =∫∞0|v (t )-y (t )|dt (17)当J I T A E 最小时,认为系统性能达到最优㊂在MA T L A B 中使用f m i n c o n 函数求解P I D参数,f m i n c o n 函数的数学模型为m i n xf (x )s .t .c (x )≤0A x ≤b l b ≤x ≤u b c e q (x )=0A e q x =b e üþýïïïïïïïïq(18)令k P ㊁k I ㊁k D ㊁λ㊁μ㊁ωp ㊁ωg 分别对应x 1㊁x 2㊁x 3㊁x 4㊁x 5㊁x 6㊁x 7,令初始条件为-10㊁-10㊁-10㊁0.1㊁0.1㊁0㊁0,不等式约束中下限l b 分别为-10㊁-10㊁-10㊁0.1㊁0.1㊁0㊁0,上限u b 分别为10㊁10㊁10㊁5㊁5㊁90㊁90,等式约束c e q (x )为式(16)中的4个等式,在MA T L A B 中求出最优的k P ㊁k I ㊁k D ㊁λ㊁μ分别为18㊁0.15㊁10.5㊁1.8㊁1.5,得到J I T A E =2.47,分数阶P I D 控制器为G f r c ‐pi d =18+0.15s -1.8+10.5s 1.5(19)5 试验验证5.1 仿真分析确定λ和μ的取范围值后,I T A E 性能指标曲线如图5所示,发现当μ从0增大到1.5时,I T A E下降,当μ从1.5开始增大时I T A E 上升,故μ为1.5时I T A E 最优㊂图5 μ对阶跃响应下的I T A E 影响当取μ=1.5,输入为单位阶跃信号,λ由0到5递增变化时,I T A E 性能指标如图6所示,可见λ=1.8时I T A E 性能指标值达到最小㊂综上所述,由仿真结果可知,本文所提算法中当λ=1.5,μ=1.8时控制效果最理想,与f m i n c o n 函数求解的分数阶P I D 参数结果一致㊂图6 λ对阶跃响应下的I T A E 影响如果采用整数阶P I D ,则解得整数阶P I D 控制器为G i n t ‐pi d =4.0+0.0015s -1+3s (20)比较使用分数阶P I D 控制器㊁整数阶P I D 控制器和不使用控制器三种情况下系统的单位阶跃㊃1103㊃多轴车辆第三轴电控液压转向系统及其P I D 控制钱立军 胡伟龙 邱利宏等Copyright ©博看网. All Rights Reserved.响应,结果如图7所示㊂从图7可以看出,分数阶P I D 比整数阶P I D 具有更小的超调量和调节时间㊂图7 分数阶P I D 控制㊁整数阶P I D 控制和无控制下系统阶跃输出5.2 台架试验在实车试验前,设计试验台架以验证系统的可靠性㊂试验台架中第一轴和第三轴用两个安装了转角传感器的转向主销代替㊂第一轴转向用手柄模拟㊂转角通过两个主销的转动量指针和刻度盘来读取㊂第一轴外轮最大转角27°,内轮的最大转角33°,第三轴转角随第一轴转角的变化关系如图8所示,其中,曲线1表示无电控液压转向系统时第(a )v =10m /s(b )v =20m /s图8 分数阶P I D 控制㊁整数阶P I D 控制和无电控液压转向系统下第三轴转角及其期望转角的误差三轴转角与期望转角的差,曲线2表示无电控液压转向系统时第三轴转角,曲线3表示整数阶P I D 控制下的第三轴转角与期望转角的差,曲线4表示整数阶P I D 控制下的第三轴转角,曲线5表示分数阶P I D 控制下的第三轴转角与期望转角的差,曲线6表示分数阶P I D 控制下的第三轴转角,曲线7表示第三轴期望转角㊂可以看出,在v =10m /s 和v =20m /s 时,使用分数阶P I D 控制器后第三轴的实际转角和第三轴预期转角的误差值都在允许的误差值(±0.3°)内,且使用分数阶P I D 控制的第三轴转向效果比使用整数阶P I D 和不使用第三轴电控液压转向系统的效果都要好㊂5.3 实车试验为了验证该电控液压转向系统在车辆实际运行过程中的转向效果,进行了实车试验㊂比较未安装电控液压转向系统空载转向后轮胎的磨损情况(图9a )和安装分数阶控制器的电控液压转向系统空载转向后轮胎的磨损情况(图9b )后,可以非常明显的看出,图9a 中轮胎磨损严重,图9b 中轮胎磨损较小㊂安装分数阶控制器的电控液压转向系统后,车辆空载转向工况下轮胎的磨损情况得到很大的改善㊂图9c 中,车辆加载铁块,第三轴承重最大,单轴承载7.5t ㊂试验过程中车辆在正常行驶时运行良好且没有出现轮胎磨损现象,如图9d 所示㊂说明安装分数阶控制器的电控液压转向系统后,车辆满载转向工况下轮胎的磨损情况也得到很大的改善㊂(a)未安装电控液压转向系统空载转向后轮胎的磨损情况(b )安装分数阶P I D 电控液压转向系统空载转向后轮胎的磨损情况(c )满载试验(d)满载试验后轮胎磨损情况图9 实车试验6 结论(1)重点研究了电控液压转向系统的液压执行机构,由于该机构中设有对中自锁油缸,使得长㊃2103㊃中国机械工程第26卷第22期2015年11月下半月Copyright ©博看网. All Rights Reserved.时间直线行驶时第三轴不会发生自动偏转㊂(2)建立了电控液压转向系统模型及分数阶P I D控制器并解出了控制器参数㊂仿真结果验证了P I D控制器参数求取方法的正确性㊂(3)进行了台架试验,结果表明安装分数阶P I D控制的电控液压转向系统后,第三轴实际转角比整数阶或不安装电控液压转向系统更接近期望转角,且误差很小㊂进行了实车试验,发现在空载和满载下安装分数阶P I D控制的电控液压转向系统比不安装的系统,轮胎磨损得到了改善㊂参考文献:[1] 刘少君.多轴车辆第三轴电控液压转向系统研究[D].合肥:合肥工业大学,2013.[2] Z h u Y o n g q i a n g,Z h a n g P i n g x i a.S t e e r i n g A n a l y s i so f M u l t i‐a x l e V e h i c l e B a s e d o n A D AM S/V I E W[C]//2n d I n t e r n a t i o n a l C o n f e r e n c e o n A d v a n c e dE n g i n e e r i n g M a t e r i a l sa n d T e c h n o l o g y(A E M T).Z h u h a i,C h i n a,2012:2878‐2881.[3] L i uY u n.O p t i m u m D e s i g no fM u l t i‐a x l eT r a i l e r’sS t e e r i n g M e c h a n i s m B a s e do nA D AM S[C]//I n t e r-n a t i o n a l C o n f e r e n c e o nG r e e nP o w e r,M a t e r i a l s a n dM a n u f a c t u r i n g T e c h n o l o g y a n dA p p l i c a t i o n s(G P M-M T A2011).C h o n g q i n g,2011:289‐293. [4] W a n g S h u f e n g,Z h a n g J u n y o u.T h e D e s i g na n dP e r f o r m a n c eA n a l y s i s o fM u l t i‐a x l eD y n a m i cS t e e r-i n g S y s t e m[C]//I n t e r n a t i o n a lC o n f e r e n c eo n A p-p l i e d M e c h a n i c s a n d M e c h a n i c a lE n g i n e e r i n g.C h a n-g s h a,2010:756‐761.[5] W a n g S h u f e n g,L iH u a s h i.A n a l y s i s o fV e h i c l eP a-r a m e t e r sE f f e c t s o nS t e e r i n g P e r f o r m a n c e o fT h r e e‐A x l eV e h i c l ew i t h M u l t i‐a x l eS t e e r i n g[C]//2n dI n-t e r n a t i o n a lC o n f e r e n c e o n M o d e l l i n g a n d S i m u l a-t i o n.T o k y o,2009:240‐244.[6] 王云超.多轴转向车辆转向性能研究[D].长春:吉林大学,2007.[7] 韩汪利.多轴车辆转向控制系统设计及仿真[D].长沙:湖南大学,2011.[8] 田阳阳.多轴车辆电液比例转向控制系统研究[D].长春:吉林大学,2008.[9] 江桂云,王勇勤,严兴春.液压伺服阀控缸动态特性数学建模及仿真分析[J].四川大学学报(工程科学版),2008,40(5):195‐198.J i a n g G u i y a n,W a n g Y o n g q i n g,Y a n X i n g c h u n.M a t h e m a t i c s M o d e l i n g a n dS i m u l a t i o n A n a l y s i so fD y n a m i cC h a r a c t e r i s t i c sf o r H y d r a u l i cC y l i n d e rb yS e r v o‐v a l v e[J].J o u r n a l o f S i c h u a nU n i v e r s i t y(E n g i-n e e r i n g S c i e n c eE d i t i o n),2008,40(5):195‐198.[10] 王淼.分数阶控制器设计与仿真研究[D].北京:北京交通大学,2014.[11] K i r y a k o v aV.F r o mt h eH y p e r‐B e s s e l O p e r a t o r s o fD i m o v s k it ot h e G e n e r a l i z e d F r a c t i o n a l C a l c u l u s[J].F r a c t i o n a lC a l c u l u sa n d A p p l i e d A n a l y s i s,2014(12):977‐1000.[12] S r i v a s t a v aH M,G a b o u r y S,B a y a dA.E x p a n s i o nF o r m u l a sf o ra n E x t e n d e d H u r w i t z‐L e r c h Z e t aF u n c t i o nO b t a i n e dv i aF r a c t i o n a l C a l c u l u s[J].A d-v a n c e s i nD i f f e r e n c eE q u a t i o n s,2014(6):169.[13] V a l e r i oD,M a c h a d o JT,K i r y a k o v aV.S o m eP i o-n e e r so ft h e A p p l i c a t i o n s o f F r a c t i o n a l C a l c u l u s[J].F r a c t i o n a lC a l c u l u sa n d A p p l i e d A n a l y s i s,2014(6):552‐578.[14] M a c h a d oJT,K i r y a k o v aV,M a i n a r d iF.R e c e n tH i s t o r y o f F r a c t i o n a l C a l c u l u s[J].C o mm u n i c a t i o n si n N o n l i n e a r S c i e n c ea n d N u m e r i c a lS i m u l a t i o n,2011,16(3):1140‐1153.[15] 赵春娜.分数阶系统分析与设计[M].北京:国防工业出版社,2011.[16] 邓立为,宋申民,庞慧.控制系统的分数阶建模及分数阶P IλDμ控制器设计[J].电机与控制学报,2014,18(3):85‐92.D e n g L i w e i,S o n g S h e n m i n,P a n g H u i.F r a c t i o n a lO r d e r M o d e l f o rC o n t r o lS y s t e m a n d D e s i g n O f-f r a c t i o n a lO r d e rP IλDμC o n t r o l l e r[J].E l e c t r i cM a-c h i n e s a n dC o n t r o l,2014,18(3):95‐92.[17] Z h a o C h u n n a,X u eD i n g y u,C h e n Y a n g q u a n.AF r a c t i o n a l O r d e r P I D T u n i n g A l g o r i t h m f o r aC l a s s o f F r a c t i o n a l O r d e r P l a n t s[C]//I E E EI C MA.N i a g a r aF a l l s,2005:216‐221.(编辑 王旻玥)作者简介:钱立军,男,1962年生㊂合肥工业大学机械与汽车工程学院教授㊁博士研究生导师㊂主要研究方向为汽车现代设计理论与方法㊁电动汽车技术㊁汽车电子控制㊂胡伟龙,男,1988年生㊂合肥工业大学机械与汽车工程学院博士研究生㊂邱利宏,男,1989年生㊂合肥工业大学机械与汽车工程学院博士研究生㊂刘少君,男,1989年生㊂合肥工业大学机械与汽车工程学院硕士研究生㊂㊃3103㊃多轴车辆第三轴电控液压转向系统及其P I D控制 钱立军 胡伟龙 邱利宏等Copyright©博看网. All Rights Reserved.。

四川省南充高级中学22024_2025学年高二英语上学期期中试题

四川省南充高级中学22024_2025学年高二英语上学期期中试题

四川省南充高级中学22024-2025学年高二英语上学期期中试题(试卷满分 150分,考试时间120分钟)第一部分听力(共两节,满分30分)第一节 (共5小题;每小题1.5分,满分7.5分)听下面 5 段对话。

每段对话后有一个小题,从题中所给的 A、B、C 三个选项中选出最佳选项,并标在试卷的相应位置。

听完每段对话后,你都有 10 秒钟的时间来回答有关小题和阅读下一小题。

每段对话仅读一遍。

1. What is the woman going to do tomorrow?A. Go fishing.B. Go out with a friend.C. Go to the hospital.2. How will the woman get to the museum?A. By taking a shortcut.B. By taking the subway.C. By walking through the park.3. Where will the woman go on vacation?A. To New York City.B. To Los Angeles.C. To Burbank.4. Who is Mike?A. Miguel’s friend.B. Melvin’s friend.C. The girl’s brother.5. What do the speakers mainly talk about?A. The man’s hair.B. A photo of the man.C. The woman’s new hairstyle.其次节 (共15小题;每小题1.5分,满分22.5分)听下面 5 段对话或独白。

每段对话或独白后有几个小题,从题中所给的 A、B、C 三个选项中选出最佳选项,并标在试卷的相应位置。

听每段对话或独白前,你将有时间阅读各个小题,每小题5 秒钟;听完后,各小题给出 5 秒钟的作答时间。

2009-A Detailed Analysis of the KDD CUP 99 Data

2009-A Detailed Analysis of the KDD CUP 99 Data

NRC Publications Archive (NPArC)Archives des publications du CNRC (NPArC)Publisher’s version / la version de l'éditeur:Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications 2009, 2009-07-10A Detailed Analysis of the KDD CUP 99 Data SetTavallaee, Mahbod; Bagheri, Ebrahim; Lu, Wei; Ghorbani, Ali-A.Contact us / Contactez nous: nparc.cisti@nrc-cnrc.gc.ca.http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/jsp/nparc_cp.jsp?lang=frL’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site Web page / page Webhttp://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=rtdoc&an=15084639&lang=en http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=rtdoc&an=15084639&lang=frLISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE.Access and use of this website and the material on it are subject to the Terms and Conditions set forth athttp://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/jsp/nparc_cp.jsp?lang=enA Detailed Analysis of the KDD CUP99Data SetMahbod Tavallaee,Ebrahim Bagheri,Wei Lu,and Ali A.GhorbaniAbstract—During the last decade,anomaly detection has attracted the attention of many researchers to overcome the weakness of signature-based IDSs in detecting novel attacks, and KDDCUP’99is the mostly widely used data set for the evaluation of these systems.Having conducted an statistical analysis on this data set,we found two important issues which highly affects the performance of evaluated systems,and results in a very poor evaluation of anomaly detection approaches.To solve these issues,we have proposed a new data set,NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.I.I NTRODUCTIONWith the enormous growth of computer networks usage and the huge increase in the number of applications running on top of it,network security is becoming increasingly more important.As it is shown in[1],all the computer systems suffer from security vulnerabilities which are both technically difficult and economically costly to be solved by the manufacturers.Therefore,the role of Intrusion Detec-tion Systems(IDSs),as special-purpose devices to detect anomalies and attacks in the network,is becoming more important.The research in the intrusion detectionfield has been mostly focused on anomaly-based and misuse-based detection techniques for a long time.While misuse-based detection is generally favored in commercial products due to its predictability and high accuracy,in academic research anomaly detection is typically conceived as a more powerful method due to its theoretical potential for addressing novel attacks.Conducting a thorough analysis of the recent research trend in anomaly detection,one will encounter several ma-chine learning methods reported to have a very high detection rate of98%while keeping the false alarm rate at1%[2]. However,when we look at the state of the art IDS solutions and commercial tools,there is no evidence of using anomaly detection approaches,and practitioners still think that it is an immature technology.Tofind the reason of this contrast, we studied the details of the research done in anomaly detection and considered various aspects such as learning and detection approaches,training data sets,testing data sets, and evaluation methods.Our study shows that there are some inherent problems in the KDDCUP’99data set[3],which is widely used as one of the few publicly available data sets for network-based anomaly detection systems.Mahbod Tavallaee,Wei Lu,and Ali A.Ghorbani are with the Faculty of Computer Science,University of New Brunswick,Fredericton,NB,Canada (email:{m.tavallaee,wlu,ghorbani}@unb.ca),and Ebrahim Bagheri is with the Institute for Information Technology,National Research Council Canada (email:ebrahim.bagheri@nrc-cnrc.gc.ca)This work was supported by the funding from the Atlantic Canada Opportunity Agency(ACOA)through the Atlantic Innovation Fund(AIF) to Dr.Ali Ghorbani.Thefirst important deficiency in the KDD data set is the huge number of redundant records.Analyzing KDD train and test sets,we found that about78%and75%of the records are duplicated in the train and test set,respectively.This large amount of redundant records in the train set will cause learning algorithms to be biased towards the more frequent records,and thus prevent it from learning unfrequent records which are usually more harmful to networks such as U2R attacks.The existence of these repeated records in the test set,on the other hand,will cause the evaluation results to be biased by the methods which have better detection rates on the frequent records.In addition,to analyze the difficulty level of the records in KDD data set,we employed21learned machines(7learners, each trained3times with different train sets)to label the records of the entire KDD train and test sets,which provides us with21predicted labels for each record.Surprisingly, about98%of the records in the train set and86%of the records in the test set were correctly classified with all the 21learners.The reason we got these statistics on both KDD train and test sets is that in many papers,random parts of the KDD train set are used as test sets.As a result,they achieve about98%classification rate applying very simple machine learning methods.Even applying the KDD test set will result in having a minimum classification rate of86%, which makes the comparison of IDSs quite difficult since they all vary in the range of86%to100%.In this paper,we have provided a solution to solve the two mentioned issues,resulting in new train and test sets which consist of selected records of the complete KDD data set.The provided data set does not suffer from any of the mentioned problems.Furthermore,the number of records in the train and test sets are reasonable.This advantage makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently,evaluation results of different research work will be consistent and comparable.The new version of KDD data set,NSL-KDD is publicly available for researchers through our website1.Although,the data set still suffers from some of the problems discussed by McHugh[4]and may not be a perfect representative of existing real networks,because of the lack of public data sets for network-based IDSs,we believe it still can be applied as an effective benchmark data set to help researchers compare different intrusion detection methods.The rest of the paper is organized as follows.Section II introduces the KDDCUP99data set which is wildly used in anomaly detection.In Section III,wefirst review the issues 1http://nsl.cs.unb.ca/KDD/NSL-KDD.htmlin DARPA’98and then discuss the possible existence of those problems in KDD’99.The statistical observations of the KDD data set will be explained in Section IV.Section V provides some solutions for the existing problems in the KDD data set.Finally,in Section VI we draw conclusion.II.KDD CUP99DATA SET DESCRIPTIONSince1999,KDD’99[3]has been the most wildly used data set for the evaluation of anomaly detection methods. This data set is prepared by Stolfo et al.[5]and is built based on the data captured in DARPA’98IDS evaluation program[6].DARPA’98is about4gigabytes of compressed raw(binary)tcpdump data of7weeks of network traffic, which can be processed into about5million connection records,each with about100bytes.The two weeks of test data have around2million connection records.KDD training dataset consists of approximately4,900,000single connection vectors each of which contains41features and is labeled as either normal or an attack,with exactly one specific attack type.The simulated attacks fall in one of the following four categories:1)Denial of Service Attack(DoS):is an attack inwhich the attacker makes some computing or memory resource too busy or too full to handle legitimate re-quests,or denies legitimate users access to a machine.2)User to Root Attack(U2R):is a class of exploit inwhich the attacker starts out with access to a normal user account on the system(perhaps gained by sniffing passwords,a dictionary attack,or social engineering) and is able to exploit some vulnerability to gain root access to the system.3)Remote to Local Attack(R2L):occurs when anattacker who has the ability to send packets to a machine over a network but who does not have an account on that machine exploits some vulnerability to gain local access as a user of that machine.4)Probing Attack:is an attempt to gather informationabout a network of computers for the apparent purpose of circumventing its security controls.It is important to note that the test data is not from the same probability distribution as the training data,and it includes specific attack types not in the training data which make the task more realistic.Some intrusion experts believe that most novel attacks are variants of known attacks and the signature of known attacks can be sufficient to catch novel variants. The datasets contain a total number of24training attack types,with an additional14types in the test data only.The name and detail description of the training attack types are listed in[7].KDD’99features can be classified into three groups:1)Basic features:this category encapsulates all theattributes that can be extracted from a TCP/IP con-nection.Most of these features leading to an implicit delay in detection.2)Traffic features:this category includes features thatare computed with respect to a window interval and isdivided into two groups:a)“same host”features:examine only the con-nections in the past2seconds that have the samedestination host as the current connection,andcalculate statistics related to protocol behavior,service,etc.b)“same service”features:examine only theconnections in the past2seconds that have thesame service as the current connection.The two aforementioned types of“traffic”features are called time-based.However,there are several slow probing attacks that scan the hosts(or ports)using a much larger time interval than2seconds,for example, one in every minute.As a result,these attacks do not produce intrusion patterns with a time window of2 seconds.To solve this problem,the“same host”and “same service”features are re-calculated but based on the connection window of100connections rather thana time window of2seconds.These features are calledconnection-based traffic features.3)Content features:unlike most of the DoS and Probingattacks,the R2L and U2R attacks don’t have any intrusion frequent sequential patterns.This is because the DoS and Probing attacks involve many connections to some host(s)in a very short period of time;however the R2L and U2R attacks are embedded in the data portions of the packets,and normally involves only a single connection.To detect these kinds of attacks,we need some features to be able to look for suspicious be-havior in the data portion,e.g.,number of failed login attempts.These features are called content features.III.I NHERENT PROBLEMS OF KDD’99DATA SETAs it is mentioned in the previous section,KDD’99is built based on the data captured in DARPA’98which has been criticized by McHugh[4],mainly because of the character-istics of the synthetic data.As a result,some of the existing problems in DARPA’98remain in KDD’99.However,there are some deliberate or unintentional improvements,along with additional problems.In the following wefirst review the issues in DARPA’98and then discuss the possible existence of those problems in KDD’99.Finally,we discuss new issues observed in the KDD data set.1)For the sake of privacy,the experiments chose tosynthesize both the background and the attack data, and the data is claimed to be similar to that observed during several month of sampling data from a number of Air Force bases.However,neither analytical nor experimental validation of the data’s false alarm char-acteristics were undertaken.Furthermore,the workload of the synthesized data does not seem to be similar to the traffic in real networks.2)Traffic collectors such as TCPdump,which is used inDARPA’98,are very likely to become overloaded and drop packets in heavy traffic load.However,there wasno examination to check the possibility of the dropped packets.3)There is no exact definition of the attacks.For example,probing is not necessarily an attack type unless the number of iterations exceeds an specific threshold.Similarly,a packet that causes a buffer overflow is not always representative of an attack.Under such condi-tions,there should be an agreement on the definitions between the evaluator and evaluated.In DARPA’98, however,there is no specific definitions of the network attacks.In addition,there are some critiques of attack taxonomies and performance measures.However,these issues are not of much interest in this paper since most of the anomaly detection systems work with binary labels,i.e.,anomalous and normal,rather than identifying the detailed information of the attacks.Besides,the performance measure applied in DARPA’98Evaluation,ROC Curves,has been widely criticized,and since then many researchers have proposed new measures to overcome the existing deficiencies[8],[9], [10],[11],[12].While McHugh’s critique was mainly based on the pro-cedure to generate the data set rather than analysis of the data,Mahoney and Chan[13]analyzed DARPA background network traffic and found evidence of simulation artifacts that could result in an overestimation of the performance of some anomaly detection techniques.In their paper,authors mentionedfive types of anomalies leading to attack detection. However,analysis of the attacks in the DARPA data set re-vealed that many did notfit into any of these categories which are likely caused by simulation artifacts.As an example,the TTL(time to live)values of126and253appear only in hostile traffic,whereas in most background traffic the value is127and254.Similarly,some attacks can be identified by anomalous source IP addresses or anomalies in the TCP window sizefield.Fortunately the aforementioned simulation artifacts do not affect the KDD data set since the41features used in KDD are not related to any of the weaknesses mentioned in[13]. However,KDD suffers from additional problems not existing in the DARPA data set.In[14],Portnoy et al.partitioned the KDD data set into ten subsets,each containing approximately490,000instances or 10%of the data.However,they observed that the distribution of the attacks in the KDD data set is very uneven which made cross-validation very difficult.Many of these subsets contained instances of only a single type.For example,the 4th,5th,6th,and7th,10%portions of the full data set contained only smurf attacks,and the data instances in the 8th subset were almost entirely neptune intrusions. Similarly,same problem with smurf and neptune attacks in the KDD training data set is reported in[15].The authors have mentioned two problems caused by including these attacks in the data set.First,these two types of DoS attacks constitute over71%of the testing data set which completely affects the evaluation.Secondly,since they generate largeTABLE IS TATISTICS OF REDUNDANT RECORDS IN THE KDD TRAIN SET Original Records Distinct Records Reduction Rate Attacks3,925,650262,17893.32% Normal972,781812,81416.44%Total4,898,4311,074,99278.05%TABLE IIS TATISTICS OF REDUNDANT RECORDS IN THE KDD TEST SETOriginal Records Distinct Records Reduction Rate Attacks250,43629,37888.26% Normal60,59147,91120.92%Total311,02777,28975.15% volumes of traffic,they are easily detectable by other means and there is no need of using anomaly detection systems to find these attacks.IV.S TATISTICAL O BSERVATIONSAs was mentioned earlier,there are some problems in the KDD data set which cause the evaluation results on this data set to be unreliable.In this section we perform a set of experiments to show the existing deficiencies in KDD.A.Redundant RecordsOne of the most important deficiencies in the KDD data set is the huge number of redundant records,which causes the learning algorithms to be biased towards the frequent records, and thus prevent them from learning unfrequent records which are usually more harmful to networks such as U2R and R2L attacks.In addition,the existence of these repeated records in the test set will cause the evaluation results to be biased by the methods which have better detection rates on the frequent records.To solve this issue,we removed all the repeated records in the entire KDD train and test set,and kept only one copy of each record.Tables I and II illustrate the statistics of the reduction of repeated records in the KDD train and test sets, respectively.While doing this process,we encountered two invalid records in the KDD test set,number136,489and136,497. These two records contain an invalid value,ICMP,as their service feature.Therefore,we removed them from the KDD test set.B.Level of DifficultyThe typical approach for performing anomaly detection using the KDD data set is to employ a customized machine learning algorithm to learn the general behavior of the data set in order to be able to differentiate between normal and malicious activities.For this purpose,the data set is divided into test and training segments,where the learner is trained using the training portion of the data set and is then evaluatedFig.1.The distribution of#successfulPrediction values for the KDD data set recordsfor its efficiency on the test portion.Many researchers within the generalfield of machine learning have attempted to devise complex learners to optimize accuracy and detection rate over the KDD’99data set.In a similar approach,we have selected seven widely used machine learning techniques, namely J48decision tree learning[16],Naive Bayes[17], NBTree[18],Random Forest[19],Random Tree[20],Multi-layer Perceptron[21],and Support Vector Machine(SVM) [22]from the Weka[23]collection to learn the overall behavior of the KDD’99data set.For the experiments,we applied Weka’s default values as the input parameters of these methods.Investigating the existing papers on the anomaly detection which have used the KDD data set,we found that there are two common approaches to apply KDD.In thefirst, KDD’99training portion is employed for sampling both the train and test sets.However,in the second approach,the training samples are randomly collected from the KDD train set,while the samples for testing are arbitrarily selected from the KDD test set.In order to perform our experiments,we randomly created three smaller subsets of the KDD train set each of which includedfifty thousand records of information.Each of the learners where trained over the created train sets.We then employed the21learned machines(7learners,each trained 3times)to label the records of the entire KDD train and test sets,which provides us with21predicated labels for each record.Further,we annotated each record of the data set with a#successfulPrediction value,which was initialized to zero.Now,since the KDD data set provides the correct la-bel for each record,we compared the predicated label of each record given by a specific learner with the actual label,where we incremented#successfulPrediction by one if a match was found.Through this process,we calculated the number of learners that were able to correctly label that given record.The highest value for#successfulPrediction is21,which conveys the fact that all learners were able to correctly predict the label of that record.Figure1and2 illustrate the distribution of#successfulPrediction values for the KDD train and test sets,respectively.It can be clearly seen from Figure1and2that97.97% and86.64%of the records in the KDD train and test sets have been correctly labeled by all21classifiers.The obvious observation from thesefigures is that the applicationof Fig.2.The distribution of#successfulPrediction values for the KDD data set recordsTABLE IIIS TATISTICS OF RANDOMLY SELECTED RECORDS FROM KDD TRAIN SET Distinct Records Percentage Selected Records 0-54070.044076-107680.0776711-156,5250.616,48516-2058,995 5.4955,757211,008,29793.8062,557Total1,074,992100.00125,973typical learning machines to this data set would result in high accuracy rates.This shows that evaluating methods on the basis of accuracy,detection rate and false positive rate on the KDD data set is not an appropriate option.V.O UR S OLUTIONTo solve the issues mentioned in the previous section,we first removed all the redundant records in both train and test sets.Furthermore,to create a more challenging subset of the KDD data set,we randomly sampled records from the #successfulPrediction value groups shown in Figure 1and2in such a way that the number of records selected from each group is inversely proportional to the percentage of records in the original#successfulPrediction value groups.For instance,the number of records in the0-5#successfulPrediction value group of the KDD train set constitutes0.04%of the original records,therefore, 99.96%of the records in this group are included in the generated sample.Tables III and IV show the detailed statistics of randomly selected records.The generated data sets,KDDTrain+and KDDTest+,TABLE IVS TATISTICS OF RANDOMLY SELECTED RECORDS FROM KDD TEST SET Distinct Records Percentage Selected Records 0-55890.765856-10847 1.1083811-153,540 4.583,37816-207,84510.157,0492164,46883.4110,694Total77,289100.0022,544Fig.3.The performance of the selected learning machines onKDDTestFig.4.The performance of the selected learning machines on KDDTest+Fig.5.The performance of the selected learning machines on KDDTest −21included 125,973and 22,544records,respectively.Further-more,one more test set was generated that did not include any of the records that had been correctly classified by all 21learners,KDDTest −21,which incorporated 11,850records.For experimental purposes,we employed the first 20%of the records in KDDTrain +as the train set,having trained the learning methods,we applied the learned models on three test sets,namely KDDTest (original KDD test set),KDDTest +and KDDTest −21.The result of the evaluation of the learners on these data sets are shown in Figures 3,4and 5,respectively.As can be seen in Figure 3,the accuracy rate of theclassifiers on KDDTest is relatively high.This shows that the original KDD test set is skewed and unproportionately distributed,which makes it unsuitable for testing network-based anomaly detection classifiers.The results of the accu-racy and performance of learning machines on the KDD’99data set are hence unreliable and cannot be used as good indicators of the ability of the classifier to serve as a discriminative tool in network-based anomaly detection.On the contrary,KDDTest +and KDDTest −21test set provide more accurate information about the capability of the clas-sifiers.As an example,classification of SVM on KDDTest is 65.01%which is quite poor compared to other learningapproaches.However,SVM is the only learning technique whose performance is improved on KDDTest+.Analyzing both test sets,we found that SVM wrongly detects one of the most frequent records in KDDTest,which highly affects its detection performance.In contrast,in KDDTest+since this record is only occurred once,it does not have any effects on the classification rate of SVM,and provides better evaluation of learning methods.VI.C ONCLUDING R EMARKSIn this paper,we statistically analyzed the entire KDD data set.The analysis showed that there are two important issues in the data set which highly affects the performance of evaluated systems,and results in a very poor evaluation of anomaly detection approaches.To solve these issues,we have proposed a new data set,NSL-KDD[24],which consists of selected records of the complete KDD data set.This data set is publicly available for researchers through our website and has the following advantages over the original KDD data set:•It does not include redundant records in the train set,so the classifiers will not be biased towards more frequent records.•There is no duplicate records in the proposed test sets;therefore,the performance of the learners are not biased by the methods which have better detection rates on the frequent records.•The number of selected records from each difficulty-level group is inversely proportional to the percentage of records in the original KDD data set.As a result,the classification rates of distinct machine learning methods vary in a wider range,which makes it more efficient to have an accurate evaluation of different learning techniques.•The number of records in the train and test sets are reasonable,which makes it affordable to run the exper-iments on the complete set without the need to randomly select a small portion.Consequently,evaluation results of different research works will be consistent and com-parable.Although,the proposed data set still suffers from some of the problems discussed by McHugh[4]and may not be a perfect representative of existing real networks,because of the lack of public data sets for network-based IDSs,we believe it still can be applied as an effective benchmark data set to help researchers compare different intrusion detection methods.R EFERENCES[1] ndwehr,A.R.Bull,J.P.McDermott,and W.S.Choi,“Ataxonomy of computer program securityflaws,”ACM Comput.Surv., vol.26,no.3,pp.211–254,1994.[2]M.Shyu,S.Chen,K.Sarinnapakorn,and L.Chang,“A novelanomaly detection scheme based on principal component classifier,”Proceedings of the IEEE Foundations and New Directions of Data Mining Workshop,in conjunction with the Third IEEE International Conference on Data Mining(ICDM03),pp.172–179,2003.[3]KDD Cup1999.Available on:/databases/kddcup99/kddcup99.html,Ocotber2007.[4]J.McHugh,“Testing intrusion detection systems:a critique of the1998and1999darpa intrusion detection system evaluations as performed by lincoln laboratory,”ACM Transactions on Information and System Security,vol.3,no.4,pp.262–294,2000.[5]S.J.Stolfo,W.Fan,W.Lee,A.Prodromidis,and P.K.Chan,“Cost-based modeling for fraud and intrusion detection:Results from the jam project,”discex,vol.02,p.1130,2000.[6]R.P.Lippmann,D.J.Fried,I.Graf,J.W.Haines,K.R.Kendall,D.McClung,D.Weber,S.E.Webster,D.Wyschogrod,R.K.Cun-ningham,and M.A.Zissman,“Evaluating intrusion detection systems: The1998darpa off-line intrusion detection evaluation,”discex,vol.02, p.1012,2000.[7]MIT Lincoln Labs,1998DARPA Intru-sion Detection Evaluation.Available on: /mission/communications/ist/corpora/ideval/index.html,February2008.[8]S.Axelsson,“The base-rate fallacy and the difficulty of intrusiondetection,”ACM Transactions on Information and System Security (TISSEC),vol.3,no.3,pp.186–205,2000.[9]J.Gaffney Jr and J.Ulvila,“Evaluation of intrusion detectors:Adecision theory approach,”in Proceedings of IEEE Symposium on Security and Privacy,(S&P),pp.50–61,2001.[10]G.Di Crescenzo,A.Ghosh,and R.Talpade,“Towards a theory ofintrusion detection,”Lecture notes in computer science,vol.3679, p.267,2005.[11] A.Cardenas,J.Baras,and K.Seamon,“A framework for theevaluation of intrusion detection systems,”in Proceedings of IEEE Symposium on Security and Privacy,(S&P),p.15,2006.[12]G.Gu,P.Fogla, D.Dagon,W.Lee,and B.Skori´c,“Measuringintrusion detection capability:An information-theoretic approach,”in Proceedings of ACM Symposium on Information,computer and communications security(ASIACCS06),pp.90–101,ACM New York, NY,USA,2006.[13]M.Mahoney and P.Chan,“An Analysis of the1999DARPA/LincolnLaboratory Evaluation Data for Network Anomaly Detection,”LEC-TURE NOTES IN COMPUTER SCIENCE,pp.220–238,2003. [14]L.Portnoy,E.Eskin,and S.Stolfo,“Intrusion detection with unlabeleddata using clustering,”Proceedings of ACM CSS Workshop on Data Mining Applied to Security,Philadelphia,PA,November,2001. [15]K.Leung and C.Leckie,“Unsupervised anomaly detection in networkintrusion detection using clusters,”Proceedings of the Twenty-eighth Australasian conference on Computer Science-Volume38,pp.333–342,2005.[16]J.Quinlan,C4.5:Programs for Machine Learning.Morgan Kaufmann,1993.[17]G.John and ngley,“Estimating continuous distributions inBayesian classifiers,”in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence,pp.338–345,1995.[18]R.Kohavi,“Scaling up the accuracy of naive-Bayes classifiers:Adecision-tree hybrid,”in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining,vol.7,1996.[19]L.Breiman,“Random Forests,”Machine Learning,vol.45,no.1,pp.5–32,2001.[20] D.Aldous,“The continuum random tree.I,”The Annals of Probability,pp.1–28,1991.[21] D.Ruck,S.Rogers,M.Kabrisky,M.Oxley,and B.Suter,“Themultilayer perceptron as an approximation to a Bayes optimaldiscrim-inant function,”IEEE Transactions on Neural Networks,vol.1,no.4, pp.296–298,1990.[22] C.Chang and C.Lin,“LIBSVM:a library for sup-port vector machines,”2001.Software available at .tw/cjlin/libsvm.[23]“Waikato environment for knowledge analysis(weka)version3.5.7.”Available on:/ml/weka/,June,2008. [24]“Nsl-kdd data set for network-based intrusion detection systems.”Available on:http://nsl.cs.unb.ca/KDD/NSL-KDD.html,March2009.。

计算机其它_Census-Income (KDD) Data Set(人口收入(知识发现)数据集)

计算机其它_Census-Income (KDD) Data Set(人口收入(知识发现)数据集)

Census-Income (KDD) Data Set(人口收入(知识发现)数据集)数据摘要:This data set contains weighted census data extracted from the 1994 and 1995 current population surveys conducted by the U.S. Census Bureau.中文关键词:多变量,分类,UCI,人口收入,知识发现,英文关键词:Multivariate,Classification,UCI,Census-Income,KDD,数据格式:TEXT数据用途:This data set is used for classification.数据详细介绍:Census-Income (KDD) Data SetAbstract: This data set contains weighted census data extracted from the 1994 and 1995Source:Original Owner:U.S. Census Bureau/United States Department of CommerceDonor:Terran Lane and Ronny KohaviData Mining and VisualizationSilicon Graphics.terran '@' , ronnyk '@' Data Set Information:This data set contains weighted census data extracted from the 1994 and 1995 Current Population Surveys conducted by the U.S. Census Bureau. The data contains 41 demographic and employment related variables.The instance weight indicates the number of people in the population that each record represents due to stratified sampling. To do real analysis and derive conclusions, this field must be used. This attribute should *not* be used in the classifiers.One instance per line with comma delimited fields. There are 199523 instances in the data file and 99762 in the test file.The data was split into train/test in approximately 2/3, 1/3 proportions using MineSet's MIndUtil mineset-to-mlc.Attribute Information:More information detailing the meaning of the attributes can be found in the Census Bureau's documentation To make use of the data descriptions at this site, the following mappings to the Census Bureau's internal database column names will be needed:age AAGEclass of worker ACLSWKRindustry code ADTINDoccupation code ADTOCCadjusted gross income AGIeducation AHGAwage per hour AHRSPAYenrolled in edu inst last wk AHSCOLmarital status AMARITLmajor industry code AMJINDmajor occupation code AMJOCCmace ARACEhispanic Origin AREORGNsex ASEXmember of a labor union AUNMEMreason for unemployment AUNTYPEfull or part time employment stat AWKSTATcapital gains CAPGAINcapital losses CAPLOSSdivdends from stocks DIVVALfederal income tax liability FEDTAXtax filer status FILESTATregion of previous residence GRINREGstate of previous residence GRINSTdetailed household and family stat HHDFMXdetailed household summary in household HHDRELinstance weight MARSUPWTmigration code-change in msa MIGMTR1migration code-change in reg MIGMTR3migration code-move within reg MIGMTR4live in this house 1 year ago MIGSAMEmigration prev res in sunbelt MIGSUNnum persons worked for employer NOEMPfamily members under 18 PARENTtotal person earnings PEARNVALcountry of birth father PEFNTVTYcountry of birth mother PEMNTVTYcountry of birth self PENATVTYcitizenship PRCITSHPtotal person income PTOTVALown business or self employed SEOTRtaxable income amount TAXINCfill inc questionnaire for veteran's admin VETQVAveterans benefits VETYNweeks worked in year WKSWORKNote that Incomes have been binned at the $50K level to present a binary classification problem, much like the original UCI/ADULT database. The goal field of this data, however, was drawn from the "total person income" field rather than the "adjusted gross income" and may, therefore, behave differently than the orginal ADULT goal field.数据预览:点此下载完整数据集。

KDD-Cup(数据挖掘与知识发现竞赛) 介绍

KDD-Cup(数据挖掘与知识发现竞赛) 介绍
n
n
The goal – to design models to support website personalization and to improve the profitability of the site by increasing customer response. Questions - When given a set of page views,
n
ROBOCUP
n
2
About ACM KDDCUP
n n
ACM KDD: Premiere Conference in knowledge discovery and data mining ACM KDDCUP:
n
Worldwide competition in conjunction with ACM KDD conferences. showcase the best methods for discovering higher-level knowledge from data. Helping to close the gap between research and industry Stimulating further KDD research and development
Year
97 98
99 24
2000 2005 2011 30 32 1000+
4
Submissions 16 21
Algorithms (up to 2000)
5
KDD Cup 97
8
KDDCUP 1998 Results
$70,000 $65,000 $60,000 $55,000 $50,000 $45,000 $40,000 $35,000 $30,000 $25,000 $20,000 $15,000 $10,000 $5,000 $100%

成考学位英语考试

成考学位英语考试

1、What is the primary purpose of writing a business email?A. To express personal emotions.B. To provide detailed instructions on a hobby.C. To communicate professionally and efficiently in a work setting.D. To discuss current events with friends. (答案:C)2、Which of the following is NOT a common feature of academic writing?A. Use of formal language.B. Inclusion of personal anecdotes.C. Clear organization and structure.D. Citation of sources. (答案:B)3、In which situation would you use a SWOT analysis?A. When creating a personal journal entry.B. When evaluating the strengths, weaknesses, opportunities, and threats of a business or project.C. When writing a fictional short story.D. When planning a casual social gathering. (答案:B)4、What does "ROI" stand for in the context of business and finance?A. Return On InvestmentB. Random Order InputC. Rapid Online InquiryD. Resource Optimization Initiative (答案:A)5、Which of these is an example of active listening in a business meeting?A. Interrupting the speaker to share your own opinion.B. Checking your phone while the other person is talking.C. Nodding and providing verbal cues to show understanding.D. Thinking about your next response without paying attention to the speaker. (答案:C)6、What is the main goal of a marketing strategy?A. To increase production costs.B. To decrease the quality of products.C. To identify and satisfy customer needs and wants.D. To limit competition in the market. (答案:C)7、Which of the following is a key element of effective time management?A. Procrastinating tasks until the last minute.B. Prioritizing tasks based on urgency and importance.C. Multitasking constantly without focus.D. Avoiding planning and spontaneously tackling tasks as they arise. (答案:B)8、In project management, what does the acronym "SMART" stand for when setting goals?A. Specific, Measurable, Achievable, Relevant, Time-boundB. Simple, Modern, Accessible, Reliable, TimelyC. Strategic, Minimal, Attractive, Responsive, TechnologicalD. Swift, Meticulous, Ambitious, Resourceful, Tactical (答案:A)9、Which of the following best describes the concept of "supply and demand" in economics?A. The relationship between the quantity of a product available and the desire for that product.B. The process of producing goods and services.C. The study of how money is created and managed.D. The analysis of government spending and taxation. (答案:A)10、What is the purpose of a feasibility study in starting a new business?A. To determine the company's annual revenue.B. To assess the legal requirements for operating the business.C. To evaluate the practicality and potential success of the business idea.D. To design the company's logo and branding. (答案:C)。

kdd2008christen-febrl-demo

kdd2008christen-febrl-demo

Febrl–An Open Source Data Cleaning,Deduplication and Record Linkage System with a Graphical User InterfacePeter ChristenDepartment of Computer ScienceThe Australian National UniversityCanberra ACT0200,Australiapeter.christen@.auABSTRACTMatching records that refer to the same entity across data-bases is becoming an increasingly important part of many data mining projects,as often data from multiple sources needs to be matched in order to enrich data or improve its quality.Significant advances in record linkage techniques have been made in recent years.However,many new tech-niques are either implemented in research proof-of-concept systems only,or they are hidden within expensive‘black box’commercial software.This makes it difficult for both researchers and practitioners to experiment with new record linkage techniques,and to compare existing techniques with new ones.The Febrl(Freely Extensible Biomedical Record Linkage)system aims tofill this gap.It contains many re-cently developed techniques for data cleaning,deduplication and record linkage,and encapsulates them into a graphi-cal user interface(GUI).Febrl thus allows even inexperi-enced users to learn and experiment with both traditional and new record linkage techniques.Because Febrl is written in Python and its source code is available,it is fairly easy to integrate new record linkage techniques into it.Therefore, Febrl can be seen as a tool that allows researchers to com-pare various existing record linkage techniques with their own ones,enabling the record linkage research community to conduct their work more efficiently.Additionally,Febrl is suitable as a training tool for new record linkage users,and it can also be used for practical linkage projects with data sets that contain up to several hundred thousand records. Categories and Subject Descriptors:H.2.8[Database applications]:Data miningGeneral Terms:Algorithms,Experimentation Keywords:Data matching,data linkage,deduplication, data cleaning,open source software,Python1.PROJECT BACKGROUNDFebrl has been developed since2002as part of a collab-orative research project conduced between the Australian National University in Canberra and the New South Wales Department of Health in Sydney,Australia.The objective of this project is to develop new techniques for improved data cleaning and standardisation,deduplication and record linkage within the domain of health databases.Copyright is held by the author/owner(s).KDD’08,August24–27,2008,Las Vegas,Nevada,USA.ACM978-1-60558-193-4/08/08.The Febrl system is written in the programming language Python1,which is an ideal platform for rapid prototype de-velopment,as it provides data structures such as sets,lists and dictionaries(associative arrays)that allow efficient han-dling of very large data sets.It also includes many modules offering a large variety of functionalities.It has excellent built-in string handling capabilities,and the large number of extension modules facilitate,for example,database ac-cess and GUI development.The Febrl GUI is based on the PyGTK2library and the Glade3toolkit.Febrl is published under an open source software licence. Due to the availability of its source code,Febrl is suitable for the rapid development,implementation,and testing of novel record linkage algorithms and techniques,as well as for both new and experienced users to learn about,and experiment with,various record linkage techniques.Since2002Febrl has been hosted on the repository at: https:///projects/febrl/The total number of downloads of Febrlfiles on14th June 2008has reached9,840.To the best of the author’s knowl-edge,Febrl is the only open source record linkage system with a GUI that allows data cleaning and standardisation, deduplication and record linkage.The current Febrl-0.4ver-sion includes the source code,several example data sets(as available from the SecondString toolkit4),and a data set generator.The documentation contains several papers that describe the techniques implemented in Febrl,and a manual that includes several step-by-step tutorials.Compared to earlier versions,Febrl-0.4not only contains a GUI,but also a variety of new techniques,as described below.Some of the screenshots following show an example linkage of the‘Census’data set from SecondString.2.STRUCTURE AND FUNCTIONALITY The Febrl GUI has been developed with the objective to make Febrl more accessible to non-technical record linkage users[8].The structure of the Febrl GUI follows the Rattle open source data mining tool[17].The basic idea is to have a window that contains one tab(similar to tabs in modern Web browsers)per major step of the record linkage process. The start-up Febrl GUI is shown in Figure1.First,only two tabs are visibly,additional tabs will appear once the input data has been initialised.Figure1:Initial Febrl user interface after start-up. On each tab,the user can select methods and their pa-rameters,and then confirm these settings by clicking the ‘Execute’button.Corresponding Febrl Python code will be generated and shown in the‘Log’tab.Once all necessary steps are set up,the generated Python code can be saved and run outside of the GUI,or a project can be started and its results can be evaluated from within the Febrl GUI.All major tabs will be described in more detail and illustrated with corresponding screenshots in the following sections.2.1Input Data InitialisationA userfirst has to select the type of project to be con-ducted:(a)cleaning and standardisation of a data set,(b) deduplication of a data set,or(c)linkage of two data sets. The‘Data’tab of the Febrl GUI will change accordingly and either show one or two input data set selection areas. Currently,several text basedfile formats are supported,in-cluding the commonly used CSV(comma separated values) format.Once afile has been selected,itsfirst few lines will be shown,as illustrated in Figure2.This enables the user to verify the chosen settings,or adjust them if required.When satisfied,a click on the‘Execute’button will confirm the settings,and the tabs for data exploration and,depending upon the project type selected,standardisation,or indexing, comparison and classification will become visible.2.2Data ExplorationThe‘Explore’tab enables the user to analyse the input data set(s)to get a better understanding of its/their content and quality.After a click on the‘Execute’button,the data set(s)will be read and allfields(or attributes,columns)will be analysed.A report will be displayed that for eachfield provides information about the number of different values in it,the alphabetically smallest and largest values,the most and least frequent values,the quantiles distribution of the values,the number of records with missing values,as well as a guess of the type of thefield(if it contains only digits, only letters,or is of mixed type).A summary table of this analysis is then displayed,as shown in Figure3.2.3Data Cleaning and StandardisationData cleaning and standardisation using the Febrl GUI is currently done separately from a linkage or deduplica-tion.A data set can be cleaned and standardised,andis Figure2:Febrl user interface for a linkage project af-ter the‘Census’input data sets have beeninitialised. Figure3:Data exploration tab showing summaryanalysis of recordfields(or attributes,columns). then written into a new data set,which in turn can then be deduplicated or used for a linkage.Currently,Febrl con-tains standardisers for names,addresses,dates,and tele-phone numbers.The name standardiser uses a rule-based approach for simple names(such as those made of one given-and one surname only)in combination with a probabilistic hidden Markov model(HMM)approach for more complex names[11],while address standardisation is fully based on a HMM approach[9].These HMMs currently have to be trained outside of the Febrl GUI,using separate Febrl mod-ules.Dates are standardised using a list of format stringsFigure 4:Example date and telephone number stan-dardisers (for a synthetic Febrl dataset).Figure 5:Example indexing definition using the ‘BlockingIndex’method and two index definitions.that provide the expected date formats likely to be found in the input data set.Telephone numbers are also standardised using a rule-based approach.Once initialised and confirmed with a click on ‘Execute’,on the ‘Output/Run’tab (see be-low)the file name of the standardised output file can be chosen,and a standardisation project can then be started by clicking ‘Execute’on the ‘Output/Run’tab.2.4Indexing (Blocking)DefinitionBlocking or indexing is used to reduce the number of de-tailed record pair comparisons to be done [2].On the ‘Index’tab,a user can select one of seven possible indexing meth-ods.Besides the ‘FullIndex’(which will compare all record pairs and thus has a quadratic complexity)and the stan-dard ‘BlockingIndex’approach [2]as implemented in many record linkage systems,Febrl contains five recently devel-oped indexing methods [4]:‘SortingIndex’,which is based on the sorted neighbourhood approach [15];‘QGramIndex’,which uses sub-strings of length q to allow fuzzy blocking [2];‘CanopyIndex’,which employs overlapping canopycluster-Figure 6:An example of three field comparison func-tion definitions.ing using TF-IDF or Jaccard similarity [12];‘StringMapIn-dex’,which maps the index key values into a multi-dim-ensional space and performs canopy clustering on these multi-dimensional objects [16];and ‘SuffixArrayIndex’,which gen-erates all suffixes of the index key values and inserts them into a sorted array to enable efficient access to the index key values and generation of the corresponding blocks [1].Once an index method has been chosen,the actual in-dex keys have to be selected and their various parameters have to be set.Index keys are made of one field value,or a concatenation of several field values,that are often phoneti-cally encoded to group similar sounding values into the same block.Febrl contains nine encoding methods [3],including Soundex,NYSIIS,Phonix,Phonex,and Double-Metaphone.2.5Field Comparison FunctionsThe similarity functions used to compare the field (at-tribute)values of record pairs can be selected on the ‘Com-parison’tab,as shown in Figure 6.Febrl contains 26simi-larity functions,including 20approximate string comparison functions [3],as well as functions specialised for dates,times,ages,or numerical values.All these similarity functions re-turn a numerical value between 0(total dissimilarity)and 1(exact match).It is possible to adjust these values by set-ting agreement and disagreement weights,as well as a special value that will be returned if one or both of the compared values is/are empty.The similarity weights calculated for each compared record pair will be stored in a weight vector,to be used for classifying record pairs in the next step.2.6Weight Vector ClassificationFebrl contains several record pair classifiers,both super-vised and unsupervised techniques.The traditional ‘Fel-legiSunter’classifier requires manual setting of two thresh-olds [13],while with the supervised ‘OptimalThreshold’clas-sifier it is assumed that the true match status for all com-pared record pairs is known,and thus an optimal thresh-old can be calculated based on the corresponding summed weight vectors.Another supervised classifier is ‘SuppVec-Machine’,which implements a support vector machine.Figure7:Example‘Two-Step’unsupervised weight vector classifier.Both the‘KMeans’and‘FarthestFirst’[14]classifiers are unsupervised clustering approaches,and group the weight vectors into a match and a non-match cluster.Various cen-troid initialisation and distance measures are implemented in Febrl.Finally,the‘TwoStep’classifier,shown in Figure7, is an unsupervised approach which in afirst step selects weight vectors from the compared record pairs that with high likelihood correspond to true matches and true non-matches,and in a second step uses these vectors as training examples for a binary classifier[5,7,6].2.7Output Files and Running a ProjectOn this tab(not shown due to space limitation)the user can select various settings of how the match status and the matched record pairs will be saved intofiles.With a click on ‘Execute’,the Febrl GUI will ask the user if the generated project should be saved as a Pythonfile,and if the project should be started and run within the GUI.Once started,a window will pop up showing a progress bar.2.8Evaluation and Clerical ReviewOn the‘Evaluation’tab,shown in Figure8,the results of a deduplication or linkage project are visualised as a his-togram of the summed matching weights of all compared record pairs.If the true match and non-match status of record pairs is available,the quality of the conducted link-age will be shown using the measurements of accuracy,pre-cision,recall and F-measure.Also shown are measures that allow the evaluation of the complexity of a deduplication or linkage project(i.e.the number of record pairs generated by the indexing step and their quality),these measures are the reduction ratio,pairs completeness and pairs quality[10].2.9Log TabOn this tab the Febrl Python code generated throughout a project is shown,allowing experienced users to verify the correctness of the generated code.It also enables copying of this code into a user’s own Febrl Python modules.3.ACKNOWLEDGEMENTSThis work is supported by an Australian Research Council (ARC)Linkage Grant LP0453463and partially funded by the New South Wales Department of Health,Sydney.Figure8:Evaluation tab showing the matching weight histogram and quality and complexity mea-sures for a linkage.4.REFERENCES[1] A.Aizawa and K.Oyama.A fast linkage detection scheme formulti-source information integration.In WIRI’05,pages30–39, Tokyo,2005.[2]R.Baxter,P.Christen,and T.Churches.A comparison of fastblocking methods for record linkage.In ACM SIGKDDworkshop on Data Cleaning,Record Linkage and ObjectConsolidation,pages25–27,Washington DC,2003.[3]P.Christen.A comparison of personal name matching:Techniques and practical issues.In MCD’06,held at IEEEICDM’06,Hong Kong,2006.[4]P.Christen.Towards parameter-free blocking for scalablerecord linkage.Technical Report TR-CS-07-03,The Australian National University,Canberra,2007.[5]P.Christen.A two-step classification approach to unsupervisedrecord linkage.In AusDM’07,pages111–119,Gold Coast,Australia,2007.[6]P.Christen.Automatic record linkage using seeded nearestneighbour and support vector machine classification.In ACMSIGKDD’08,Las Vegas,2008.[7]P.Christen.Automatic training example selection for scalableunsupervised record linkage.In PAKDD’08,Springer LNAI5012,pages511–518,Osaka,Japan,2008.[8]P.Christen.Febrl-A freely available record linkage systemwith a graphical user interface.In HDKM’08,CRPIT vol.80, pages17–25,Wollongong,Australia,2008.[9]P.Christen and D.Belacic.Automated probabilistic addressstandardisation and verification.In AusDM’05,Sydney,2005.[10]P.Christen and K.Goiser.Quality and complexity measures fordata linkage and deduplication.In F.Guillet and H.Hamilton, editors,Quality Measures in Data Mining,volume43ofStudies in Computational Intelligence.Springer,2007.[11]T.Churches,P.Christen,K.Lim,and J.X.Zhu.Preparationof name and address data for record linkage using hiddenMarkov models.BioMed Central Medical Informatics andDecision Making,2(9),2002.[12]W.W.Cohen and J.Richman.Learning to match and clusterlarge high-dimensional data sets for data integration.In ACM SIGKDD’02,Edmonton,2002.[13]I.P.Fellegi and A.B.Sunter.A theory for record linkage.Journal of the American Statistical Society,64(328):1183–1210,1969.[14]K.Goiser and P.Christen.Towards automated record linkage.In AusDM’06,pages23–31,Sydney,2006.[15]M.A.Hernandez and S.J.Stolfo.The merge/purge problemfor large databases.In ACM SIGMOD’95,pages127–138,San Jose,1995.[16]L.Jin,C.Li,and S.Mehrotra.Efficient record linkage in largedata sets.In DASF AA’03,Tokyo,2003.[17]G.J.Williams.Data mining with Rattle and R.Togaware,Canberra,2008.Software available at:/survivor/.。

authenticity in language testing

authenticity in language testing

Authenticity in language testing:some outstanding questionsJo A.Lewkowicz English Centre,University of Hong Kong This article is divided into two main sections.Following the introduction,Section II takes a look at the concept of authenticity and the way this notion has evolved in language testing and more recently in general education.It argues that,although our understanding of the notion of authenticity has developed considerably since it wasfirst introduced into the language testing literature in the1970s,many ques-tions remain unanswered.In an attempt to address one of the outstanding issues, Section III presents a study looking at the importance of authenticity for test takers.It shows that test takers are willing and able to identify the attributes of a test likely to affect their performance.However,these attributes do not necessarily include authenticity which has hitherto been considered an important test attribute for all stakeholders in the testing process.The article concludes that much more research is needed if the nature and role of authenticity in language testing is to be fully understood.I IntroductionAny attempt at the characterization of authenticity in relation to assessment theory or practice needs to acknowledge that the notion of authenticity has been much debated both within thefields of applied linguistics as well as general education.In applied linguistics the notion emerged in the late1970s at the time when communicative methodology was gaining momentum and there was a growing inter-est in teaching and testing‘real-life’language.In general education, on the other hand,it took more than another decade before the notion gained recognition.Since then there has been much overlap in the way the term has been perceived in bothfields,yet the‘debates’have remained largely independent of each other even to the extent that, in a recent article,Cumming and Maxwell(1999:178)attribute‘[t]he first formal use of the term‘authentic’in the context of learning and assessment...to Archbald and Newmann(1988)’(my emphasis). Although many different interpretations of authenticity and auth-entic assessment have emerged,one feature of authenticity uponAddress for correspondence:Jo Lewkowicz,Associate Professor,English Centre,The Univer-sity of Hong Kong,Pokfulam Road,Hong Kong;e-mail:jolewkowȰhkusua.hku.hkLanguage Testing200017(1)43–640265-5322(00)LT171OA©2000Arnold44Authenticity in language testingwhich there has been general agreement over time is that it is an important quality for test development which‘carries a positive char-ge’(Lynch,1982:11).Morrow(1991:112),in his discussions of communicative language testing,pointed to‘the overriding impor-tance of authenticity’,while for Wood(1993)it is one of the most important issues in language testing.Wood(1993:233)has proposed that there are two major issues–those of validity and reliability–and that they‘coalesce into one even greater issue:authenticity vs. inauthenticity’.Bachman and Palmer(1996),too,see authenticity as crucial.They argue that it is‘a critical quality of language tests’(p.23),one that‘most language test developers implicitly consider in designing language tests’(p.24).Authenticity is also pivotal to Dou-glas’(1997)consideration of specific purpose tests in that it is one of two features which distinguishes such tests from more general pur-pose tests of language(the other feature being the interaction between language knowledge and specific purpose content knowledge).The same positive sentiment is echoed in thefield of general education where authentic assessment has been‘embraced enthusiastically by policy-makers,curriculum developers and practitioners alike’,being seen as‘a desirable characteristic of education’(Cumming and Maxwell,1999:178).Despite the importance accorded to authenticity,there has been a marked absence of research to demonstrate this characteristic.It is clear that authenticity is important for assessment theorists,but this may not be the case for all stakeholders in the testing process.It is not known,for example,how test takers perceive authenticity.It may be that authenticity is variably defined by the different stakeholders. It is also unclear whether the presence or absence of authenticity will affect test takers’performance.Bachman and Palmer(1996:24)sug-gest that authenticity has a potential effect on test takers’perform-ance.However,this effect is among those features of authenticity which have to be demonstrated if we are to move from speculation about the nature of authenticity to a comprehensive characterization of the notion.Before this can be achieved,a research agenda informed by our current understanding of authenticity and an identification of the unresolved issues needs to be drawn up.To this end,this article first reviews the current authenticity debate within thefield of langu-age testing and relates it to the debate in general education.It ident-ifies a range of questions which need to be attended to for a better understanding of authenticity to be achieved.The article then goes on to outline in some detail a study which sets out to address one of the questions identified,and to suggest that there is not only a need for,but also value in,a systematic investigation of authenticity.Jo A.Lewkowicz45 II The authenticity debate1The early debateIn applied linguistics the term‘authenticity’originated in the mid 1960s with a concern among materials writers such as Close(1965) and Broughton(1965)that language learners were being exposed to texts which were not representative of the target language they were learning.Close(1965),for example,stressed the authenticity of his materials in the title of his book The English we use for science, which utilized a selection of published texts on science from a variety of sources and across a range of topics.Authenticity at the time was seen as a simple notion distinguishing texts extracted from‘real-life’sources from those written for pedagogical purposes.It was not until the late1970s that Widdowson initiated a debate on the nature of authenticity.He introduced the distinction between ‘genuineness’and‘authenticity’of language arguing that: Genuineness is a characteristic of the passage itself and is an absolute quality.Authenticity is a characteristic of the relationship between the passage and the reader and has to do with appropriate response.(Widdowson,1978:80) Widdowson(1979:165)saw genuineness as a quality of all texts, while authenticity as an attribute‘bestowed’on texts by a given audi-ence.In his view,authenticity was a quality of the outcome present if the audience could realize the author’s intentions which would only be possible where the audience was aware of the conventions employed by the writer or speaker(Widdowson,1990).He argued that genuine texts would only be considered authentic after undergo-ing a process of authentication,a process which he suggested may only be truly accessible to the native speaker.He failed to account for the way language learners could progress towards being able to authenticate texts,or to describe the native speaker.However,in dis-tinguishing between genuineness and authenticity,Widdowson drew attention to the importance of the interaction between the audience and the text and hence to the nature of the outcome arising from textual input.The distinction between genuine and authentic language was not readily accepted(a point recently lamented by Widdowson himself; Widdowson,1998),and the discussion of authenticity remained for some time focused on the nature of authentic input.This was equally true in language teaching as it was in thefield of language testing. Those advocating change to pre-communicative testing practices (such as Rea,1978;Morrow1978;1979;1983;1991;Carroll,1980) equated authenticity with what Widdowson identified as genuine input and focused on the need to use texts that had not been simplified and46Authenticity in language testingtasks that simulated those that test takers would be expected to per-form in‘the real world’outside the language classroom.This understanding of authenticity,detailed in Morrow’s ground-breaking report of1978,gradually began tofilter through to language testing practices of the1980s and1990s.In1981,for example,in response to Morrow’s(1978)report,the Royal Society of Arts intro-duced the Communicative Use of English as a Foreign Language examination.This was thefirst large-scale test to focus on students’ability to use language in context(language use)rather than their knowledge about a language(language use)(Hargreaves,1987);it was also the precursor to the Certificates in Communicative Skills in English introduced in the1990s by the University of Cambridge Local Examination Syndicate(UCLES).Both these tests were premised on the belief that authentic stimulus material was a necessary component of any test of communicative language ability.The same premise informed the development of many other tests,particularly in situ-ations where oral language was being assessed and simulations of real-life tasks became a part of direct tests of spoken ability(e.g.Oral Proficiency Interviews)and where language for specific purposes was being assessed,such as in the British Council/UCLES English Langu-age Testing Service(ELTS)test battery(for more detail,see Alderson et al.,1987).This conceptualization of authenticity gave rise,however,to a number of theoretical and practical concerns.First,by equating auth-enticity with texts that had not been altered or simplified in any way, a dichotomy was created between‘authentic texts’that were seen as intrinsically‘good’and‘inauthentic texts’produced for pedagogic purposes which were seen as‘inferior’.This dichotomy proved unhelpful since it tended to ignore a number of salient features of real-life discourse.Texts produced in the real world differ(inter alia) in complexity depending on their intended audience and the amount of shared information between the parties involved in the discourse. Not all native speakers necessarily understand all texts(Seliger, 1985).Learning to deal with simple texts may,therefore,be a natural stage in the learning process and one that students need to go through (Widdowson,1979;Davies,1984).Using such texts in a test situation may similarly be considered the most appropriate for the language level of the test takers and,hence,may be totally justified.In addition, every text is produced in a specific context and the very act of extracting a text from its original source,even if it is left in its entirety,could be said to‘disauthenticate’it since authenticity, according to Widdowson(1994:386),is‘non-transferable’.In a test situation where,as Davies(1988)points out,it may not be possible or even practical to use unadapted texts,an obvious dilemma arises.Jo A.Lewkowicz47 How should such a text be regarded:authentic because it has been taken from the real world,or inauthentic as it has been extracted from its original context for test use?Another area of concern related to the view that authentic test tasks were those which mirrored real-life tasks.Such tasks are,by their very nature,simulations which cannot give rise to genuine interaction. They can,at best,be made to look like real-life tasks(Spolsky,1985). Test takers need to cooperate and be willing to abide by the‘rules of the game’if simulations are to be successful in testing situations, otherwise the validity and fairness of the assessment procedures remain suspect(Spolsky,1985).In addition,real-life holistic tasks do not necessarily lend themselves to test situations.Only a limited number of such performance-type tasks can be selected for any given test;additionally,the question of task selection for generalizations to be made from test to non-test performance seems never to have been adequately resolved.Morrow(1979)suggested characterizing each communicative task by the enabling skills needed to complete it and then determining the tasks by deciding on which enabling skills should be tested.This approach,as Alderson noted(1981),assumed that enabling skills can be identified.It also encouraged the breaking down of holistic tasks into more discrete skills,which Morrow(1979) himself recognized as problematic since a‘candidate may prove quite capable of handling individual enabling skills,and yet prove quite incapable of mobilizing them in a use situation’(p.153). Throughout the1980s the authenticity debate remainedfirmly focused on the nature of test input with scant regard being paid to the role test takers play in processing such input.The debate centred on the desired qualities of those aspects of language tests which test setters control,with advocates of authenticity promulgating the use of texts and tasks taken from real-life situations(Morrow,1979;Car-roll,1980;Doye,1991),and the sceptics drawing attention to the limitations of using such input and to the drawbacks associated with equating such input with real-life language use(Alderson,1981;Dav-ies,1984;Spolsky,1985).2A reconceptualization of authenticityIn language teaching the debate was taken forwards by Breen(1985) who suggested that authenticity may not be a single unitary notion, but one relating to texts(as well as to learners’interpretation of those texts),to tasks and to social situations of the language classroom. Breen drew attention to the fact that the aim of language learning is to be able to interpret the meaning of texts,and that any text which moves towards achieving that goal could have a role in teaching.He48Authenticity in language testingproposed that the notion of authenticity was a fairly complex one and that it was oversimplistic to dichotomize authentic and inauthentic materials,particularly since authenticity was,in his opinion,a relative rather than an absolute quality.Bachman,in the early1990s,appears to have built on the ideas put forward by Widdowson and Breen.He suggested that there was a need to distinguish between two types of authenticity:situational authenticity–that is,the perceived match between the characteristics of test tasks to target language use(TLU)tasks–and interactional authenticity–that is,the interaction between the test taker and the test task(Bachman,1991).In so doing,he acknowledged that authenticity involved more than matching test tasks to TLU tasks:he saw authen-ticity also as a quality arising from the test takers’involvement in test tasks.Bachman(1991)appeared,at least in part,to be reaffirming Widdowson’s notion of authenticity as a quality of outcome arising from the processing of input,but at the same time pointing to a need to account for‘language use’which Widdowson’s unitary definition of genuineness did not permit.Like Breen(1985),Bachman(1990;1991)also recognized the complexities of authenticity,arguing that neither situational nor inter-actional authenticity was absolute.A test task could be situationally highly authentic,but interactionally low on authenticity,or vice versa. This reconceptualization of authenticity into a complex notion per-taining to test input as well as the nature and quality of test outcome was not dissimilar to the view of authenticity emerging in thefield of general education.In the United States,in particular,the late 1980s/early1990s saw a movement away from standardized mul-tiple-choice tests to more performance-based assessment charac-terized by assessment tasks which were holistic,which provided an intellectual challenge,which were interesting for the students and which were tasks from which students could learn(Carlson,1991: 6).Of concern was not only the nature of the task,but the outcome arising from it.Although there was no single view of what constituted authentic assessment,there appears to have been general agreement that a number of factors would contribute to the authenticity of any given task.(For an overview of how learning theories determined interpretation of authentic assessment,see Cumming and Maxwell, 1999.)Furthermore,there was a recognition,at least by some(for example,Anderson et al.(1996),cited by Cumming and Maxwell, 1999),that tasks would not necessarily be either authentic or inauth-entic but would lie on a continuum which would be determined by the extent to which the assessment task related to the context in which it would be normally performed in real-life.This construction ofJo A.Lewkowicz49 authenticity as being situated within a specific context can be com-pared to situational authenticity discussed above.3A step forward?The next stage in the authenticity debate appears to have moved in a somewhat different direction.In language education,Bachman in his work with Palmer(1996)separated the notion of authenticity from that of interactiveness,defining authenticity as‘The degree of corre-spondence of the characteristics of a given language test task to the features of a TLU task’(Bachman and Palmer,1996:23).This defi-nition corresponds to that of situational authenticity,while interactive-ness replaced what was previously termed interactional authenticity. The premise behind this change was a recognition that all real-life tasks are by definition situationally authentic,so authenticity can only be an attribute of other tasks,that is,those used for testing or teach-ing.At the same time,not all genuine language tasks are equally interactive;some give rise to very little language.However,authen-ticity is in part dependent on the correspondence between the interac-tion arising from test and TLU tasks;regarding the two as separate entities may,therefore,be misleading.Certainly,Douglas(2000)con-tinues to see the two as aspects of authenticity,arguing that both need to be present in language tests for specific purposes.To approximate the degree of correspondence between test and TLU tasks–that is,to determine the authenticity of test tasks–Bach-man and Palmer(1996)proposed a framework of task characteristics. This framework provides a systematic way of matching tasks in terms of their setting,the test rubrics,test input,the outcome the tasks are expected to give rise to,and the relationship between input and response(for the complete framework,see Bachman and Palmer, 1996:49–50).The framework is important since it provides a useful checklist of task characteristics,one which allows for a degree of agreement among test developers interested in ascertaining the auth-enticity of test tasks.It takes into account both the input provided in a test as well as the expected outcome arising from the input by characterizing not only test tasks but also test takers’interactions with these.4Outstanding questionsOperationalizing the Bachman and Palmer(1996)framework does, however,pose a number of challenges.To determine the degree of correspondence between test tasks and TLU tasks,it is necessary to ‘first identify the critical features that define tasks in the TLU domain’50Authenticity in language testing(Bachman and Palmer,1996:24).How this is to be achieved is not clear.Identifying critical features of TLU tasks appears to require judgements which may be similar to those needed to identify enabling skills of test and non-test tasks.Once such judgements have been made,test specifications need to be implemented and,in the process of so doing,the specifications may undergo adjustment.This is parti-cularly likely to happen during test moderation when,as recent research has revealed(Lewkowicz,1997),considerations other than maintaining a desired degree of correspondence between test and non-test task tend to prevail.It must be remembered that test development is an evolutionary process during which changes and modifications are likely to be continually introduced.Such changes may,ultimately, even if unintentionally,affect the degree of correspondence between the test tasks and TLU tasks.In other words,the degree of authen-ticity of the resultant test tasks may fail to match the desired level of authenticity identified at the test specification stage.Whether in reality such differences in the degree of correspondence between a test task and TLU tasks are significant remains to be inves-tigated.It is possible that if one were to consider all the characteristics for each test task in relation to possible TLU tasks(a time-consuming process),then the differences in authenticity across test tasks might be negligible.Some tasks could display a considerable degree of auth-enticity in terms of input while others could display the same degree of authenticity only in terms of output,situation or any combination of such factors.None would feature as highly authentic in terms of rubric since this is likely to‘be a characteristic for which there is relatively little correspondence between language use tasks and test tasks’(Bachman and Palmer,1996:50).The above issues,all of which relate to the problem of identifying critical task characteristics,give rise to a number of unresolved ques-tions:1)Which characteristics are critical for distinguishing authenticfrom non-authentic test tasks?2)Are some of these characteristics more critical than others?3)What degree of correspondence is needed for test tasks and TLUtasks to be perceived as authentic?4)How can test developers ensure that the critical characteristicsidentified at the test specification stage are present in the resultant test tasks and not‘eroded’in the process of test development? An underlying assumption which underpins the Bachman and Palmer framework is that TLU tasks can be characterized.This,however, may not always be possible or practical.In situations where learners have homogeneous needs and where they are learning a language forJo A.Lewkowicz51specific purposes,identifying and characterizing the TLU domain may be a realistic endeavour.Douglas(2000)suggests this to be the case.However,in circumstances where learners’needs are diverse and test setters have a very large number of TLU tasks to draw upon, such characterization of all TLU tasks may be unrealistic.Even if such a characterization were possible,it may not necessarily prove useful.The large number of TLU tasks characterized could ensure a level of authenticity for most test tasks selected,since the larger the number of TLU tasks to choose from,the more likely it is that there would be a level of correspondence between the test tasks and the TLU domain.This leads to the following questions:5)Can critical characteristics be identified for all tests,that is,gen-eral purpose as well as specific purpose language tests?6)If so,do they need to be identified for both general and specificpurpose tests?A third set of questions relates to test outcome:whether test tasks that correspond highly to TLU tasks in terms of task characteristics are perceived as authentic by stakeholders other than the test devel-opers.There has been some research in this area to suggest that end-users may prove useful informants for determining the degree to which test tasks are perceived as authentic.In a study investigating oral discourse produced in response to prompts given as part of the Occupational English Test for Health Professionals,Lumley and Brown(1998)found that their professional informants perceived the tasks set as authentic.However,they also found that the tasks gave rise to a number of problems which restricted the authenticity of the language produced,that is,of the test outcome.They found that the role cards given to the test takers provided insufficient background information about‘their patient’.As a result,when discussing the patient’s condition with the examiner(playing the role of a concerned relative),the test takers failed to sound convincing and authoritative. This would suggest that authenticity is made up of constituent parts such as authenticity of input,purpose and outcome,leading to the questions:7)What are the constituents of test authenticity,and are each of theconstituents equally important?8)Does the interaction arising from test tasks give rise to thatintended by the test developers?9)To what extent can/do test tasks give rise to authentic-soundingoutput which allow for generalizations to be made about test takers’performance in the real world?52Authenticity in language testingThefinal set of questions to be considered relate to stakeholder per-ceptions of the importance of authenticity.It has already been sug-gested that the significance of authenticity may be variably perceived among and between different groups of stakeholders.It is,for example,possible that perceived authenticity plays an important role in test takers’performance,as Bachman and Palmer(1996)propose. However,it is conceivable that authenticity is important for some–not all–test takers and only under certain circumstances.It is equally possible that authenticity is not important for test takers,but it is important for other stakeholders such as teachers preparing candidates for a test(see Section III).We need to address the following questions if we are to ascertain the importance of test authenticity:10)How important is authenticity for the various stakeholders ofa test?11)How do perceptions of authenticity differ among and betweendifferent stakeholders of a test?12)Does a perception of authenticity affect test takers’performanceand,if so,in what ways?13)Does the importance attributed to authenticity depend on factorssuch as test takers’age,language proficiency,educational level, strategic competence or purpose for taking a test(whether it isa high or low stakes test)?14)Will perceived authenticity impact on classroom practices andif so,in what way(s)?In relation to thefinal question(14),it is worth noting the marked absence of authenticity in discussions of washback(the impact of tests on teaching).The close tie drawn between authentic achievement and authentic assessment in educational literature implies a mutual dependence.Cumming and Maxwell(1999)go as far as to suggest that there is a tension between four factors–learning goals,learning processes,teaching activities and assessment procedures–all of which are in‘dynamic tension’and‘adjustment of one component requires sympathetic adjustment of the other three’(p.179).Yet, literature on washback in applied linguistics fails to acknowledge this relationship.Wall(1997),for example,in her overview of washback does not mention the potential of test authenticity on classroom prac-tices.Similarly,Alderson et al.(1995)–in considering the principles which underlie actual test construction for major examination boards in Britain–do not identify authenticity as an issue. Authenticity,as the above overview suggests,has been much debated in the literature.In fact,there have been two parallel debates on authenticity which have remained largely ignorant of each other.Discussions within thefield of applied linguistics and general edu-cation–as Lewkowicz(1997)suggests–need to come closer together.Furthermore,such discussions need to be empirically based to inform what has until now been a predominantly theoretical debate. The questions identified earlier demonstrate that there is still much that is unknown about authenticity.As Peacock(1997:44)has argued with reference to language teaching:‘research to date on this topic is inadequate,and$further research is justified by the importance accorded authentic materials in the literature’.This need is equally true for language testing,which is the primary focus of this article. One aspect of authenticity which has been subject to considerable speculation,but which has remained under-researched,is related to test takers’perceptions of authenticity.The following study was set up to understand more fully the importance test takers accord to this test characteristic,and to determine whether their perceptions of auth-enticity affect their performance on a test.III The study1The subjectsA group of72first-year students from the University of Hong Kong were identified for this study.They were allfirst-year undergraduate students taking an English enhancement course as part of their degree curriculum.The students were Cantonese speakers between18and 20years of age.All had been learning English for13or more years.2The testsThe students were given two language tests within a period of three weeks.The two tests were selected because they were seen as very different in terms of their authenticity.The test administeredfirst to all the students was a90-item multiple-choice test based on a TOEFL practice test.It was made up of four sections:sentence structure(15 items),written expression(25items),vocabulary(20items)and reading comprehension(30items).The students were familiar with this type of test as they had taken similar multiple-choice tests throughout their school career;additionally,part of the Use of English examination,which is required for university entrance,is made up of multiple-choice items.The second test administered was an EAP(English for academic purposes)test which,in terms of Bachman and Palmer’s(1996)test task characteristics,was perceived as being reasonably authentic.Stu-dents took the test at the end of their English enhancement course,。

mngement动态能力与战略管理

mngement动态能力与战略管理

企业如何竞争?企业如何赚取高于正常的回报吗?什么是需要长期保持卓越的性能呢?一个日益强大的经营策略这些基本问题的答案在于动态能力的概念。

这些的技能,程序,例程,组织结构和学科,使公司建立,聘请和协调相关的无形资产,以满足客户的需求,并不能轻易被竞争对手复制。

具有较强的动态能力是企业强烈的进取精神。

他们不仅适应商业生态系统,他们也塑造他们通过创新,协作,学习和参与。

大卫·蒂斯是动态能力的角度来看的先驱。

它植根于25年,他的研究,教学和咨询。

他的思想已经在企业战略,管理和经济学的影响力,创新,技术管理和竞争政策有关。

通过他的顾问和咨询工作,他也带来了这些想法,承担业务和政策,使周围的世界。

本书的核心思想动态能力是最清晰和最简洁的语句。

蒂斯解释其成因,应用,以及如何他们提供了一个替代的方法很多传统的战略思想,立足于简单和过时的产业组织和竞争优势的基础的理解。

通俗易懂撰写并发表了,这将是一个非常宝贵的工具,为所有那些谁想要理解这一重要的战略思想的贡献,他们的MBA学生,学者,管理人员,或顾问和刺激。

Strategic Management Journal, Vol. 18:7, 509–533 (1997)The dynamic capabilities framework analyzes the sources and methods of wealth creation and capture by private enterprise firms operating in environments of rapid technological change. The competitive advantage of firms is seen as resting on distinctive processes (ways of coordinating and combining), shaped by the firm’s (specific) asset positions (such as the firm’s portfolio of difficult-to-trade knowledge assets and complementary assets), and the evolution path(s) it has adopted or inherited. The importance of path dependencies is amplified where conditions of increasing returns exist. Whether and how a firm’s competitive advantage is eroded depends on the stability of market demand, and the ease of replicability (expanding internally) and imitatability (replication by competitors). If correct, the framework suggests that private wealth creation in regimes of rapid technological change depends in large measure on honing internal technological, organizational, and managerial processes inside the firm. In short, identifying new opportunities and organizing effectively and efficiently to embrace them are generally more fundamental to private wealth creation than is strategizing, if by strategizing one means engaging in business conduct that keeps competitors off balance, raises rival’s costs, and excludes new entrants. (C) 1997 by John Wiley & Sons, Ltd.战略管理杂志。

IEEE 1222-2004 IEEE Standard for All Dielectric Self-Supporting Fiber Optic Cable

IEEE 1222-2004 IEEE Standard for All Dielectric Self-Supporting Fiber Optic Cable

IEEE Std 1222™-2004I E E E S t a n d a r d s 1222TM IEEE Standard for All-Dielectric Self-Supporting Fiber Optic Cable 3 Park Avenue, New York, NY 10016-5997, USA IEEE Power Engineering Society Sponsored by the Power System Communications Committee30 July 2004Print: SH95192PDF: SS95192Recognized as anAmerican National Standard (ANSI)The Institute of Electrical and Electronics Engineers, Inc.3 Park Avenue, New York, NY 10016-5997, USACopyright © 2004 by the Institute of Electrical and Electronics Engineers, Inc.All rights reserved. Published 30 July 2004. Printed in the United States of America.IEEE is a registered trademark in the U.S. Patent & Trademark Office, owned by the Institute of Electrical and Electronics Engineers, Incorporated.Print: ISBN 0-7381-3887-8SH95192PDF: ISBN 0-7381-3888-6SS95192No part of this publication may be reproduced in any form, in an electronic retrieval system or otherwise, without the prior written permission of the publisher.IEEE Std 1222™-2003IEEE Standard for All-Dielectric Self-Supporting Fiber Optic CableSponsorPower System Communications Committeeof theIEEE Power Engineering SocietyApproved 31 March 2004American National Standards InstituteApproved 10 December 2003IEEE-SA Standards BoardAbstract: Construction, mechanical, electrical, and optical performance, installation guidelines, ac-ceptance criteria, test requirements, environmental considerations, and accessories for an all-dielectric, nonmetallic, self-supporting fiber optic (ADSS) cable are covered in this standard. The ADSS cable is designed to be located primarily on overhead utility facilities. This standard provides both construction and performance requirements that ensure within the guidelines of the standard that the dielectric capabilities of the cable components and maintenance of optical fiber integrity and optical transmissions are proper. This standard may involve hazardous materials, operations, and equipment. It does not purport to address all of the safety issues associated with its use, and it is the responsibility of the user to establish appropriate safety and health practices and to determine the applicability of regulatory limitations prior to use.Keywords: aeolian vibration, aerial cables, all-dielectric self-supporting (ADSS), buffer, cable reels, cable safety, cable thermal aging, dielectric, distribution lines, electric fields, electrical stress,fiber optic cable, galloping, grounding, hardware, high voltage, optical ground wire (OPGW), plastic cable, sag and tension, self-supporting, sheave test, span length, string procedures, temperature cycle test, tracking, transmission lines, ultraviolet (UV) deteriorationIEEE Standards documents are developed within the IEEE Societies and the Standards Coordinating Committees of the IEEE Standards Association (IEEE-SA) Standards Board. The IEEE develops its standards through a consensus development process, approved by the American National Standards Institute, which brings together volunteers representing varied view-points and interests to achieve the final product. Volunteers are not necessarily members of the Institute and serve without compensation. While the IEEE administers the process and establishes rules to promote fairness in the consensus develop-ment process, the IEEE does not independently evaluate, test, or verify the accuracy of any of the information contained in its standards.Use of an IEEE Standard is wholly voluntary. The IEEE disclaims liability for any personal injury, property or other dam-age, of any nature whatsoever, whether special, indirect, consequential, or compensatory, directly or indirectly resulting from the publication, use of, or reliance upon this, or any other IEEE Standard document.The IEEE does not warrant or represent the accuracy or content of the material contained herein, and expressly disclaims any express or implied warranty, including any implied warranty of merchantability or fitness for a specific purpose, or that the use of the material contained herein is free from patent infringement. IEEE Standards documents are supplied “AS IS .”The existence of an IEEE Standard does not imply that there are no other ways to produce, test, measure, purchase, market,or provide other goods and services related to the scope of the IEEE Standard. Furthermore, the viewpoint expressed at the time a standard is approved and issued is subject to change brought about through developments in the state of the art and comments received from users of the standard. Every IEEE Standard is subjected to review at least every five years for revi-sion or reaffirmation. When a document is more than five years old and has not been reaffirmed, it is reasonable to conclude that its contents, although still of some value, do not wholly reflect the present state of the art. Users are cautioned to check to determine that they have the latest edition of any IEEE Standard.In publishing and making this document available, the IEEE is not suggesting or rendering professional or other services for, or on behalf of, any person or entity. Nor is the IEEE undertaking to perform any duty owed by any other person or entity to another. Any person utilizing this, and any other IEEE Standards document, should rely upon the advice of a com-petent professional in determining the exercise of reasonable care in any given circumstances.Interpretations: Occasionally questions may arise regarding the meaning of portions of standards as they relate to specific applications. When the need for interpretations is brought to the attention of IEEE, the Institute will initiate action to prepare appropriate responses. Since IEEE Standards represent a consensus of concerned interests, it is important to ensure that any interpretation has also received the concurrence of a balance of interests. For this reason, IEEE and the members of its soci-eties and Standards Coordinating Committees are not able to provide an instant response to interpretation requests except in those cases where the matter has previously received formal consideration. At lectures, symposia, seminars, or educational courses, an individual presenting information on IEEE standards shall make it clear that his or her views should be considered the personal views of that individual rather than the formal position, explanation, or interpretation of the IEEE.Comments for revision of IEEE Standards are welcome from any interested party, regardless of membership affiliation with IEEE. Suggestions for changes in documents should be in the form of a proposed change of text, together with appropriate supporting comments. Comments on standards and requests for interpretations should be addressed to:Secretary, IEEE-SA Standards Board445 Hoes LaneP.O. Box 1331Piscataway, NJ 08855-1331USAAuthorization to photocopy portions of any individual standard for internal or personal use is granted by the Institute of Electrical and Electronics Engineers, Inc., provided that the appropriate fee is paid to Copyright Clearance Center. To arrange for payment of licensing fee, please contact Copyright Clearance Center, Customer Service, 222 Rosewood Drive,Danvers, MA 01923 USA; +1 978 750 8400. Permission to photocopy portions of any individual standard for educational classroom use can also be obtained through the Copyright Clearance Center.NOTE −Attention is called to the possibility that implementation of this standard may require use of subject matter covered by patent rights. By publication of this standard, no position is taken with respect to the exist-ence or validity of any patent rights in connection therewith. The IEEE shall not be responsible for identifying patents for which a license may be required by an IEEE standard or for conducting inquiries into the legal valid-ity or scope of those patents that are brought to its attention.Introduction(This introduction is not a part of IEEE Std 1222-2003, IEEE Standard for All-Dielectric Self-Supporting Fiber Optic Cable.)All-dielectric self-supporting (ADSS) fiber optic cables are being installed throughout the power utility industry. Because of the unique service environment and design of these cables, many new requirements are necessary to ensure proper design and application of these cables. In order to develop an industry-wide set of requirements and tests, the Fiber Optic Standards Working Group, under the direction of the Fiber Optic Subcommittee of the Communications Committee, brought together the expertise of key representatives from throughout the industry. These key people are from each manufacturer of ADSS cables and a cross sec-tion of the end users. All manufacturers and all known users were invited to participate in preparing this standard.The preparation of this standard occurred over a period of several years, and participation changed through-out that time as companies and individuals changed interests and positions. Effort was always made to include key individuals from each and every manufacturing concern, major user groups, and consulting firms. Membership and participation was open to everyone who had an interest in the standard, and all involvement was encouraged. This worldwide representation helps to ensure that this standard reflects the entire industry.As ADSS fiber optic cables are a new and changing technology, the working group is continuing to work on new revisions to this standard as the need arises.Notice to usersErrataErrata, if any, for this and all other standards can be accessed at the following URL: http:// /reading/ieee/updates/errata/index.html. Users are encouraged to check this URL for errata periodically.InterpretationsCurrent interpretations can be accessed at the following URL: /reading/ieee/interp/ index.html.PatentsAttention is called to the possibility that implementation of this standard may require use of subject matter covered by patent rights. By publication of this standard, no position is taken with respect to the existence or validity of any patent rights in connection therewith. The IEEE shall not be responsible for identifying patents or patent applications for which a license may be required to implement an IEEE standard or for conducting inquiries into the legal validity or scope of those patents that are brought to its attention. Copyright © 2004 IEEE. All rights reserved.iiiiv Copyright © 2004 IEEE. All rights reserved.ParticipantsDuring the preparation of this standard, the Fiber Optic Standards Working Group had the following membership:William A. Byrd, ChairRobert E. Bratton, Co-ChairThe following members of the individual balloting committee voted on this standard. Balloters may have voted for approval, disapproval, or abstention.When the IEEE-SA Standards Board approved this standard on 10 December 2003, it had the following membership:Don Wright, ChairHoward M. Frazier, Vice ChairJudith Gorman, Secretary*Member EmeritusAlso included are the following nonvoting IEEE-SA Standards Board liaisons:Satish K. Aggarwal, NRC RepresentativeRichard DeBlasio, DOE RepresentativeAlan Cookson, NIST RepresentativeSavoula AmanatidisIEEE Standards Managing EditorPhilip AdelizziHiroji AkasakaTom AldertonDave BouchardMark BoxerTerrence BurnsKurt DallasPaul DanielsWilliam DeWittGary DitroiaRobert EmersonTrey Fleck Denise Frey Henry Grad Jim Hartpence Claire Hatfield John Jones Tommy King Konrad Loebl John MacNair Andrew McDowell Tom Newhart Serge Pichot Craig Pon Jim Puzan Joe Renowden William Rich Tewfik Schehade John Smith Matt Soltis Dave Sunkel Alexander Torres Monty Tuominen Jan Wang Tim West Eric WhithamWole AkposeThomas BlairAl BonnymanStuart BoucheyMark BoxerRobert Bratton Terrence Burns William A. Byrd Manish Chaturvedi Ernest Duckworth Amir El-Sheikh Robert Emerson Denise Frey Jerry Goerz Brian G. Herbst Edward Horgan Mihai Ioan David JacksonPi-Cheng LawH. Stephen BergerJoe BruderBob DavisRichard DeBlasioJulian Forster*Toshio FukudaArnold M. GreenspanRaymond Hapeman Donald M. Heirman Laura Hitchcock Richard H. Hulett Anant Jain Lowell G. Johnson Joseph L. Koepfinger*Tom McGean Steve Mills Daleep C. Mohla William J. Moylan Paul Nikolich Gary Robinson Malcolm V. Thaden Geoffrey O. Thompson Doug Topping Howard L. WolfmanContents1.Overview (1)1.1Scope (1)2.ADSS cable and components (1)2.1Description (1)2.2Support systems (1)2.3Fiber optic cable core (2)2.4Optical fibers (3)2.5Buffer construction (3)2.6Color coding (3)2.7Jackets (3)3.Test requirements (4)3.1Cable tests (4)3.2Fiber tests (7)4.Test methods (10)4.1Cable tests (10)4.2Fiber tests (14)5.Sag and tension list (16)6.Field acceptance testing (16)6.1Fiber continuity (17)6.2Attenuation (17)6.3Fiber length (17)7.Installation recommendations (17)7.1Installation procedure for ADSS (17)7.2Electric field strength (17)7.3Span lengths (17)7.4Sag and tension (18)7.5Stringing sheaves (18)7.6Maximum stringing tension (18)7.7Handling (18)7.8Hardware and accessories (18)7.9Electrical stress (18)Copyright © 2004 IEEE. All rights reserved.v8.Cable marking and packaging requirements (19)8.1Reels (19)8.2Cable end requirements (19)8.3Cable length tolerance (19)8.4Certified test data (19)8.5Reel tag (20)8.6Cable marking (20)8.7Cable remarking (20)8.8Identification marking (20)8.9SOCC (21)Annex A (informative) Electrical test (24)Annex B (informative) Aeolian vibration test (26)Annex C (informative) Galloping test (28)Annex D (informative) Sheave test (ADSS) (30)Annex E (informative) Temperature cycle test (32)Annex F (informative) Cable thermal aging test (33)Annex G (informative) Bibliography (34)vi Copyright © 2004 IEEE. All rights reserved.IEEE Standard for All-DielectricSelf-Supporting Fiber Optic Cable1. Overview1.1 ScopeThis standard covers the construction, mechanical, electrical, and optical performance, installation guidelines, acceptance criteria, test requirements, environmental considerations, and accessories for an all-dielectric, nonmetallic, self-supporting fiber optic (ADSS) cable. The ADSS cable is designed to be located primarily on overhead utility facilities.The standard provides both construction and performance requirements that ensure within the guidelines of the standard that the dielectric capabilities of the cable components and maintenance of optical fiber integ-rity and optical transmissions are proper.This standard may involve hazardous materials, operations, and equipment. This standard does not purport to address all of the safety issues associated with its use. It is the responsibility of the user of this standard to establish appropriate safety and health practices and to determine the applicability of regulatory limitations prior to use.2. ADSS cable and components2.1 DescriptionThe ADSS cable shall consist of coated glass optical fibers contained in a protective dielectric fiber optic unit surrounded by or attached to suitable dielectric strength members and jackets. The cable shall not con-tain metallic components. The cable shall be designed to meet the design requirements of the optical cable under all installation conditions, operating temperatures, and environmental loading.2.2 Support systemsa)ADSS cable shall contain support systems that are integral to the cable. The purpose of the supportsystem is to ensure that the cable meets the optical requirements under all specified installation con-ditions, operating temperatures, and environmental loading for its design life. This standard excludes any “lashed” type of cables.Copyright © 2004 IEEE. All rights reserved.1IEEEStd 1222-2003IEEE STANDARD FOR ALL-DIELECTRICb)The basic annular construction may have aramid or other dielectric strands or a channeled dielectricrod as a support structure. In addition, other cable elements, such as central members, may be load bearing.c)Figure-8 constructions may have a dielectric messenger and a fiber optic unit, both of which share acommon outer jacket. In addition, other cable elements, such as central members, may be load bearing.d)Helically stranded cable systems may consist of a dielectric optical cable prestranded around adielectric messenger.e)The design load of the cable shall be specified so that support hardware can be manufactured to per-form under all environmental loading conditions. For zero fiber strain cable designs, the design load is defined as the load at which the optical fibers begin to elongate. For other cable designs, the design load is defined as the load at which the measured fiber strain reaches a predetermined level.f)Other designs previously not described are not excluded from this specification.2.3 Fiber optic cable coreThe fiber optic cable core shall be made up of coated glass optical fibers housed to protect the fibers from mechanical, environmental, and electrical stresses. Materials used within the core shall be compatible with one another, shall not degrade under the electrical stresses to which they may be exposed, and shall not evolve hydrogen sufficient to degrade optical performance of fibers within the cable.2.3.1 Fiber strain allowanceThe cable core shall be designed such that fiber strain does not exceed the limit allowed by the cable manu-facturer under the operational design limits of the cable. Maximum allowable fiber strain will generally be a function of the proof test level and strength and fatigue parameters of the coated glass fiber.2.3.2 Central structural elementIf a central structural element is necessary, it shall be of reinforced plastic, epoxiglass, or other dielectric material. If required, this element shall provide the necessary tensile strength to limit axial stress on the fibers and minimize fiber buckling due to cable contraction at low temperatures.2.3.3 Buffer tube filling compoundLoose buffer tubes shall be filled with a suitable compound compatible with the tubing material, fiber coat-ing, and coloring to protect the optical fibers and prevent moisture ingress.2.3.4 Cable core filling/flooding compoundThe design of the cable may include a suitable filling/flooding compound in the interstices to prohibit water migration along the fiber optic cable core. The filling compound shall be compatible with all components with which it may come in contact.2.3.5 Binder/tapeA binder yarn(s) and/or a layer(s) of overlapping nonhygroscopic tape(s) may be used to hold the cable core elements in place during application of the jacket.2Copyright © 2004 IEEE. All rights reserved.IEEE SELF-SUPPORTING FIBER OPTIC CABLE Std 1222-20032.3.6 Inner jacketA protective inner jacket or jackets of a suitable material may be applied over the fiber optic cable core, iso-lating the cable core from any external strength elements and the cable outer jacket.2.4 Optical fibersSingle-mode fibers, dispersion-unshifted, dispersion-shifted, or nonzero dispersion-shifted, and multimode fibers with 50/125 mm or 62.5/125 mm core/clad diameters are considered in this standard. The core and the cladding shall consist of glass that is predominantly silica (SiO2). The coating, usually made from one or more plastic materials or compositions, shall be provided to protect the fiber during manufacture, handling, and use.2.5 Buffer constructionThe individually coated optical fiber(s) or fiber ribbon(s) may be surrounded by a buffer for protection from physical damage during fabrication, installation, and performance of the ADSS. Loose buffer or tight buffer construction are two types of protection that may be used to isolate the fibers. The fiber coating and buffer shall be strippable for splicing and termination.2.5.1 Loose bufferLoose buffer construction shall consist of a tube or channel that surrounds each fiber or fiber group. The inside of the tube or channel shall be filled with a filling compound.2.5.2 Tight buffer constructionTight buffer construction shall consist of a suitable material that comes in contact with the coated fiber. 2.6 Color codingColor coding is essential for identifying individual optical fibers and groups of optical fibers. The colors shall be in accordance with TIA/EIA 598-A-1995 [B43].12.6.1 Color performanceThe original color coding system shall be discernible and permanent, in accordance with EIA359-A-1985[B3], throughout the design life of the cable, when cleaned and prepared per manufacturer’s recommendations.2.7 JacketsThe outer jacket shall be designed to house and protect the inner elements of the cable from damage due to moisture, sunlight, environmental, thermal, mechanical, and electrical stresses.a)The jacket material shall be dielectric, non-nutrient to fungus, and meet the requirements of3.1.1.13. The jacket material may consist of a polyethylene that shall contain carbon black and anantioxidant.b)The jacket shall be extruded over the underlying element and shall be of uniform diameter to prop-erly fit support hardware. The extruded surface shall be smooth for minimal ice buildup.1The numbers in brackets correspond to those of the bibliography in Annex G.Copyright © 2004 IEEE. All rights reserved.3Std 1222-2003IEEE STANDARD FOR ALL-DIELECTRICc)The cable jacket shall be suitable for application in electrical fields as defined in this clause anddemonstrated in 3.1.1.3.Class A: Where the level of electrical stress on the jacket does not exceed 12 kV spacepotential.Class B: Where the level of electrical stress on the jacket may exceed 12 kV space potential. NOTE—See 7.9 for additional deployment details.23. Test requirementsEach requirement in this clause is complementary to the corresponding paragraph in Clause4 that describesa performance verification or test procedure.3.1 Cable tests3.1.1 Design testsAn ADSS cable shall successfully pass the following design tests. However, design tests may be waived at the option of the user if an ADSS cable of identical design has been previously tested to demonstrate the capability of the manufacturer to furnish cable with the desired performance characteristics.3.1.1.1 Water blocking testA water block test for cable shall be performed in accordance with 4.1.1.1. No water shall leak through the open end of the 1 m sample. If the first sample fails, one additional 1 m sample, taken from a section of cable adjacent to the first sample, may be tested for acceptance.3.1.1.2 Seepage of filling/flooding compoundFor filled/flooded fiber optic cable, a seepage of filling/flooding compound test shall be performed in accor-dance with 4.1.1.2. The filling and flooding compound shall not flow (drip or leak) at 65 o C.3.1.1.3 Electrical testsElectrical tests shall be performed for Class B cables in accordance with 4.1.1.3. Tracking on the outside of the sheath resulting in erosion at any point that exceeds more than 50% of the wall thickness shall constitutea failure.3.1.1.4 Aeolian vibration testAn aeolian vibration test shall be carried out in accordance with 4.1.1.4. Any damage that will affect the mechanical performance of the cable or causes permanent or temporary increase in optical attenuation greater than 1.0 dB/km of the tested fibers at 1550 nm for single-mode fibers and at 1300 nm for multimode fibers shall constitute failure.2Notes in text, tables, and figures are given for information only and do not contain requirements needed to implement the standard.3.1.1.5 Galloping testA galloping test shall be carried out in accordance with 4.1.1.5. Any damage that will affect the mechanical performance of the cable or causes permanent or temporary increase in optical attenuation greater than 1.0dB/km of the tested fibers at 1550 nm for single-mode fibers and at 1300 nm for multimode fibers shall constitute failure.3.1.1.6 Sheave testA sheave test shall be carried out in accordance with 4.1.1.6. Any significant damage to the ADSS cable shall constitute failure. A permanent increase in optical attenuation greater than 1.0 dB/km of the tested fibers at 1550nm for single-mode fibers and at 1300 nm for multimode fibers shall constitute failure.Or successful completion of the following three tests may be a substitute for the sheave test:a)Tensile strength of a cable: The maximum increase in attenuation shall not be greater than 0.10 dBfor single-mode and 0.20 dB for multimode fibers when the cable is subjected to the maximum cable rated tensile load.b)Cable twist: The cable shall be capable of withstanding mechanical twisting without experiencingan average increase in attenuation greater than 0.10 dB for single-mode and 0.20 dB for multimode fibers.c)Cable cyclic flexing: The cable sample shall be capable of withstanding mechanical flexing withoutexperiencing an average increase in attenuation greater than 0.10 dB for single-mode and 0.20 dB for multimode fibers.3.1.1.7 Crush test and impact test3.1.1.7.1 Crush testA crush test shall be performed in accordance with 4.1.1.7.1. A permanent or temporary increase in optical attenuation value greater than 0.2 dB change in sample at 1550 nm for single-mode fibers and 0.4 dB at 1300nm for multimode fibers shall constitute failure.3.1.1.7.2 Impact testAn impact test shall be performed in accordance with 4.1.1.7.2. A permanent increase in optical attenuation value greater than 0.2 dB change in sample at 1550 nm for single-mode and 0.4 dB at 1300 nm for multi-mode fibers shall constitute failure.3.1.1.8 Creep testA creep test shall be carried out in accordance with 4.1.1.8. Values shall correspond with the manufacturer’s recommendations.3.1.1.9 Stress/strain testA stress/strain test shall be carried out in accordance with 4.1.1.9. The maximum rated cable load (MRCL), maximum rated cable strain (MRCS), and maximum axial fiber strain specified by the manufacturer for their cable design shall be verified. Any visual damage to the cable or permanent or temporary increase in optical attenuation greater than 0.10 dB at 1550 nm for single-mode fiber and 0.20 dB at 1300 nm for multimode fibers shall constitute failure.Std 1222-2003IEEE STANDARD FOR ALL-DIELECTRIC 3.1.1.10 Cable cutoff wavelength (single-mode fiber)The cutoff wavelength of the cabled fiber, λcc, shall be less than 1260 nm.3.1.1.11 Temperature cycle testOptical cables shall maintain mechanical and optical integrity when exposed to the following temperature extremes: –40 o C to +65 o C.The change in attenuation at extreme operational temperatures for single-mode fibers shall not be greater than 0.20 dB/km, with 80% of the measured values no greater than 0.10 dB/km. For single-mode fibers, the attenuation change measurements shall be made at 1550 nm.For multimode fibers, the change shall not be greater than 0.50 dB/km, with 80% of the measured values no greater than 0.25 dB/km. The multimode fiber measurements shall be made at 1300 nm unless otherwise specified.A temperature cycle test shall be performed in accordance with 4.1.1.11.3.1.1.12 Cable aging testThe cable aging test shall be a continuation of the temperature cycle test.The change in attenuation from the original values observed before the start of the temperature cycle test shall not be greater than 0.40 dB/km, with 80% of the measured values no greater than 0.20 dB/km for sin-gle-mode fibers.For multimode fibers, the change in attenuation shall not be greater than 1.00 dB/km, with 80% of the mea-sured values no greater than 0.50 dB/km.There shall be no discernible difference between the jacket identification and length marking colors of the aged sample relative to those of an unaged sample of the same cable. The fiber coating color(s) and unit/bun-dle identifier color(s) shall be in accordance with TIA/EIA 598-A-1992 [B43].A cable aging test shall be performed in accordance with 4.1.1.12.3.1.1.13 Ultraviolet (UV) resistance testThe cable and jacket system is expected to perform satisfactorily in the user-specified environment into which the cable is being placed into service. Because of the numerous possible environmental locations available, it is the user’s and supplier’s joint responsibility to provide the particular performance requirements of each installation location. These performance criteria are for nonsevere environments. The IEC 60068-2-1[B12] performance standards should be used to define particular environmental testing requirements for each unique location.The cable jacket shall meet the following requirements:Where carbon black is used as a UV damage inhibitor, the cable shall have a minimum absorption coeffi-cient of 0.32 per meter.Where the other cable UV blocking systems are being employed, the cable shalla)Meet the equivalent UV performance of carbon black at 0.32 per meterb)Meet the performance requirements as stated in 4.1.1.13 for IEC 60068-2-1 [B12] testing。

Myers-Briggs Type Indicator 测评手册说明书

Myers-Briggs Type Indicator 测评手册说明书

Myers-Briggs Type Indicator (MBTI)by James Lani/myers-briggs-type-indicator-mbti/Click here for to get help with your Thesis or Dissertation.Click here for FREE Thesis and Dissertation resources (templates, samples, calculators).The Myers-Briggs Type Indicator is commonly used instrument for the evaluation of a person’s personality and behavior. Currently there exist five forms of MBTI: Form M, Form M self-scorable, Form G, Form G self-scorable, and Form Q. The test has been translated into 21 different languages and has established itself as a useful method in improving performance, choosing careers, and reducing workplace conflict.AuthorsIsabel Briggs MyersValidity and ReliabilityBased on the most recent forms of MBTI (M and Q), the internal consistency was .90 for Form M and .77 for Form Q. A sample of 3,009 people representing a national sample was used for Form M, whereas a nationally representative sample of 1,378 was used for Form Q. The test is given to 2 million people every year; in addition, the test is used by companies and researchers. The MBTI is an established instrument used for the analysis of personality.Obtaining the MBTICPPAdministration, Analysis and ReportingStatistics Solutions consists of a team of professional methodologists and statisticians that can assist the student or professional researcher in administering the survey instrument, collecting the data, conducting the analyses and explaining the results.For additional information on these services, click here.Dissertations Using the Myers-Briggs Type IndicatorBelow is a list of dissertations that use the MBTI. The full version of these dissertations can be found using ProQuest.Li, Y. (2003). Assessment of nursing college students learning styles in taiwan using the myers-briggs type indicator. University of Southern California).Stauning-Santiago, B. (2003). Identification of at-risk nursing students using the myers-briggs type indicator and hollands vocational preference inventory. State University of New York at Albany).Horstein, C. A. (1995). Identification of personality types of associate degree nursing students and faculty based on the myers-briggs type indicator. Pepperdine University).Puyleart, B. L. (2006). Learning styles of baccalaureate nursing students using the myers-briggs type indicator. Marian College of Fond du Lac).Zitkus, B. S. (2008). The relationship among registered nurses personality type, weight status, weight loss motivating factors, weight loss regimens, and successful or unsuccessful weight loss. Dowling College).ReferencesSchaubhut, N. A., Thompson, R. C., & O’Hara, J. M. (2008). The influence of personality of where people choose to work. Boston, MA: Poster presented at the Annual Convention of the American Psychological Association, August 14-17, 2008.O’Hara, J. M., Thompson, R. C., Donnay, D. A. C., Morris, M. L., & Schaubhut, N. A. (August, 2006). Correlating the newly revised Strong Interest Inventory® with the MBTI®. New Orleans, LA: Poster presented at the American Psychological Association Annual Conference.Quenk, N. L., Hammer, A. L., & Majors, M. S. (2001). MBTI® Step II Manual. Mountain View, CA: CPP, Inc. View_______________________________________________。

数据挖掘_KDD Cup 2003 Datasets(2003年KDD数据集)

数据挖掘_KDD Cup 2003 Datasets(2003年KDD数据集)

KDD Cup 2003 Datasets(2003年KDD数据集)数据摘要:The e-print arXiv, initiated in Aug 1991, has become the primary mode of research communication in multiple fields of physics, and some related disciplines. It currently contains over 225,000 full text articles and is growing at a rate of 40,000 new submissions per year. It provides nearly comprehensive coverage of large areas of physics, and serves as an on-line seminar system for those areas. It serves 10 million requests per month, including tens of thousands of search queries per day. Its collections are a unique resource for algorithmic experiments and model building. Usage data has been collected since 1991, including Web usage logs beginning in 1993. On average, the full text of each paper was downloaded over 300 times since 1996, and some were downloaded tens of thousands of times.中文关键词:KDD杯,数据集,研究交流,物理,文章,算法与实验,建模,英文关键词:KDD Cup,Datasets,researchcommunication,physics,articles,algorithmic experiments,model building,数据格式:TEXT数据用途:Social Network AnalysisInformation ProcessingClassification数据详细介绍:KDD Cup 2003 DatasetsNewsSept 5, 2003: Presentation slides from the KDD conference are now available.August 20, 2003: Scores for the winners of Tasks 1-3 have been posted.August 19, 2003: Solutions for Task 1-3 have been posted.August 18, 2003: Results for Task 1 have been posted.August 15, 2003: Results for Tasks 2, 3, and 4 have been posted. The winners for Task 1 will be announced by August 18.IntroductionWelcome to KDD Cup 2003, a knowledge discovery and data mining competition held in conjunction with the Ninth Annual ACM SIGKDD Conference. This year's competition focuses on problems motivated by network mining and the analysis of usage logs. Complex networks have emerged as a central theme in data mining applications, appearing in domains that range from communication networks and the Web, to biological interaction networks, to social networks and homeland security. At the same time, thedifficulty in obtaining complete and accurate representations of large networks has been an obstacle to research in this area.This KDD Cup is based on a very large archive of research papers that provides an unusually comprehensive snapshot of a particular social network in action; in addition to the full text of research papers, it includes both explicit citation structure and (partial) data on the downloading of papers by users. It provides a framework for testing general network and usage mining techniques, which will be explored via four varied and interesting task. Each task is a separate competition with its own specific goals.The first task involves predicting the future; contestants predict how many citations each paper will receive during the three months leading up to the KDD 2003 conference. For the second task, contestants must build a citation graph of a large subset of the archive from only the LaTex sources. In the third task, each paper's popularity will be estimated based on partial download logs. And the last task is open! Given the large amount of data, contestants can devise their own questions and the most interesting result is the winner.Data DescriptionThe e-print arXiv, initiated in Aug 1991, has become the primary mode of research communication in multiple fields of physics, and some related disciplines. It currently contains over 225,000 full text articles and is growing at a rate of 40,000 new submissions per year. It provides nearly comprehensive coverage of large areas of physics, and serves as an on-line seminar system for those areas. It serves 10 million requests per month, including tens of thousands of search queries per day. Its collections are a unique resource for algorithmic experiments and model building. Usage data has been collected since 1991, including Web usage logs beginning in 1993. On average, the full text of each paper was downloaded over 300 times since 1996, and some were downloaded tens of thousands of times.The Stanford Linear Accelerator Center SPIRES-HEP database has been comprehensively cataloguing the High Energy Particle Physics (HEP) literature onlinesince 1974, and indexes more than 500,000 high-energy physics related articles including their full citation tree.数据预览:点此下载完整数据集。

DDRTree包的说明说明书

DDRTree包的说明说明书

Package‘DDRTree’October12,2022Type PackageTitle Learning Principal Graphs with DDRTreeVersion0.1.5Date2017-4-14Author Xiaojie Qiu,Cole Trapnell,Qi Mao,Li WangDepends irlbaImports RcppLinkingTo Rcpp,RcppEigen,BHMaintainer Xiaojie Qiu<***********>Description Provides an implementation of the framework of reversed graph embed-ding(RGE)which projects data into a reduced dimensional space while constructs a princi-pal tree which passes through the middle of the data simultaneously.DDRTree shows superior-ity to alternatives(Wishbone,DPT)for inferring the ordering as well as the intrinsic struc-ture of the single cell genomics data.In general,it could be used to reconstruct the temporal pro-gression as well as bifurcation structure of any datatype.License Artistic License2.0RoxygenNote6.0.1SystemRequirements C++11NeedsCompilation yesRepository CRANDate/Publication2017-04-3020:54:17UTCR topics documented:DDRTree (2)get_major_eigenvalue (5)pca_projection_R (6)sqdist_R (6)Index71DDRTree Perform DDRTree constructionDescriptionPerform DDRTree constructionThis is an R and C code implementation of the DDRTree algorithm from Qi Mao,Li Wang et al.Qi Mao,Li Wang,Steve Goodison,and Yijun Sun.Dimensionality Reduction via Graph Struc-ture Learning.The21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’15),2015/citation.cfm?id=2783309to perform dimension reduction and principal graph learning simultaneously.Please cite this pack-age and KDD’15paper if you found DDRTree is useful for your research.UsageDDRTree(X,dimensions=2,initial_method=NULL,maxIter=20,sigma=0.001,lambda=NULL,ncenter=NULL,param.gamma=10,tol=0.001,verbose=F,...)ArgumentsX a matrix with D×N dimension which is needed to perform DDRTree construc-tiondimensions reduced dimensioninitial_method a function to take the data transpose of X as input and then output the reduced dimension,row number should not larger than observation and column numbershould not be larger than variables(like isomap may only return matrix on validsample sets).Sample names of returned reduced dimension should be preserved.maxIter maximum iterationssigma bandwidth parameterlambda regularization parameter for inverse graph embeddingncenter number of nodes allowed in the regularization graphparam.gamma regularization parameter for k-means(the prefix of’param’is used to avoid name collision with gamma)tol relative objective differenceverbose emit extensive debug output...additional arguments passed to DDRTreeValuea list with W,Z,stree,Y,history W is the orthogonal set of d(dimensions)linear basis vector Z isthe reduced dimension space stree is the smooth tree graph embedded in the low dimension space Y represents latent points as the center of ZIntroductionThe unprecedented increase in big-data causes a huge difficulty in data visualization and down-stream analysis.Conventional dimension reduction approaches(for example,PCA,ICA,Isomap, LLE,etc.)are limited in their ability to explictly recover the intrinisic structure from the data as well as the discriminative feature representation,both are important for scientific discovery.The DDRTree algorithm is a new algorithm to perform the following three tasks in one setting:1.Reduce high dimension data into a low dimension space2.Recover an explicit smooth graph structure with local geometry only captured by distancesof data points in the low dimension space.3.Obtain clustering structures of data points in reduced dimensionDimensionality reduction via graph structure learningReverse graph embedding is previously applied to learn the intrinisic graph structure in the original dimension.The optimization of graph inference can be represented as:min f g∈Fmin{z1,...,z M}(V i,V j)∈Eb i,j||f g(z i)−f g(z j)||2where f g is a function to map the instrinsic data space Z={z1,...,z M}back to the input data space(reverse embedding)X={x1,...,x N}.V i is the the vertex of the instrinsic undirected graph G=(V,E).b ij is the edge weight associates with the edge set E.In order to learn the intrinsic structure from a reduced dimension,we need also to consider a term which includes the error during the learning of the instrinsic structure.This strategy is incorporated as the following:min G∈ˆG b minf g∈Fmin{z1,...,z M}Ni=1||x i−f g(z i)||2+λ2(V i,V j)∈Eb i,j||f g(z i)−f g(z j)||2whereλis a non-negative parameter which controls the tradeoff between the data reconstruction error and the reverse graph embedding.Dimensionality reduction via learning a treeThe general framework for reducing dimension by learning an intrinsic structure in a low dimen-sion requires a feasible setˆG b of graph and a mapping function f G.The algorithm uses minimum spanning tree as the feasible tree graph structure,which can be solved by Kruskal’algoritm.A linear projection model f g(z)=Wz is used as the mapping function.Those setting results in thefollowing specific form for the previous framework:minW ,Z ,BN i =1||x i −Wz i ||2+λ2i,jb i,j ||Wz i −Wz j ||2where W =[w 1,...,w d ]∈R D ×d is an orthogonal set of d linear basis vectors.We can group tree graph B ,the orthogonal set of linear basis vectors and projected points in reduced dimension W ,Z as two groups and apply alternative structure optimization to optimize the tree graph.This method is defined as DRtree (Dimension Reduction tree)as discussed by the authors.Discriminative dimensionality reduction via learning a treeIn order to avoid the issues where data points scattered into different branches (which leads to lose of cluster information)and to incorporate the discriminative information,another set of points {y k }K k =1as the centers of {z i }Ni =1can be also introduced.By so doing,the objective functions of K-means and the DRtree can be simulatenously minimized.The author further proposed a soft partition method to account for the limits from K-means and proposed the following objective function:minW ,Z ,B ,Y ,R Ni =1||x i −Wz i ||2+λ2 k,kb k,k ||Wy k −Wy k ||2+γ K k =1N i =1r i,k ||z i −y k ||2+σΩ(R ) s.t.W T W =I ,B ∈B ,K k =1r i,k =1,r i,k ≤0,∀i,∀kwhere R ∈R N ×N ,Ω(R )= N i =1 kk =1r i,k log r i,k is the negative entropy regularization which transforms the hard assignments used in K-means into soft assignments and σ>0is the reg-ulization parameter.Alternative structure optimization is again used to solve the above problem by separately optimize each group W ,Z ,Y ,B ,R until convergence.The actual algorithm of DDRTree1.Input :Data matrix X ,parameters λ,σ,γ2.Initialize Z by PCA3.K =N,Y =Z4.repeat :5.d k,k =||y k −y k ||2,∀k,∀k6.Obtain B via Kruskal’s algorithm7.L =diag (B1)−Bpute R with each element9.τ=diag (1TR )10.Q =11+γI +R (1+γγ(λγL+τ)−R T R )−1RT11.C =XQX T12.Perform eigen-decomposition on C such that C =U ∧U T and diag (∧)is sorted in a descend-ing order13.W =U (:,1:d )14.Z =W T XQ15.Y =ZR (λγL +τ)−116.Until Convergenceget_major_eigenvalue5 Implementation of DDRTree algorithmWe implemented the algorithm mostly in Rcpp for the purpose of efficiency.It also has extensive optimization for sparse input data.This implementation is originally based on the matlab code provided from the author of DDRTree paper.Examplesdata( iris )subset_iris_mat<-as.matrix(t(iris[c(1,2,52,103),1:4]))#subset the data#run DDRTree with ncenters equal to species numberDDRTree_res<-DDRTree(subset_iris_mat,dimensions=2,maxIter=5,sigma=1e-2,lambda=1,ncenter=3,param.gamma=10,tol=1e-2,verbose=FALSE)Z<-DDRTree_res$Z#obatain matrixY<-DDRTree_res$Ystree<-DDRTree_res$streeplot(Z[1,],Z[2,],col=iris[c(1,2,52,103), Species ])#reduced dimensionlegend("center",legend=unique(iris[c(1,2,52,103), Species ]),cex=0.8,col=unique(iris[c(1,2,52,103), Species ]),pch=1)#legendtitle(main="DDRTree reduced dimension",col.main="red",font.main=4)dev.off()plot(Y[1,],Y[2,],col= blue ,pch=17)#center of the Ztitle(main="DDRTree smooth principal curves",col.main="red",font.main=4)#run DDRTree with ncenters equal to species numberDDRTree_res<-DDRTree(subset_iris_mat,dimensions=2,maxIter=5,sigma=1e-3,lambda=1,ncenter=NULL,param.gamma=10,tol=1e-2,verbose=FALSE)Z<-DDRTree_res$Z#obatain matrixY<-DDRTree_res$Ystree<-DDRTree_res$streeplot(Z[1,],Z[2,],col=iris[c(1,2,52,103), Species ])#reduced dimensionlegend("center",legend=unique(iris[c(1,2,52,103), Species ]),cex=0.8,col=unique(iris[c(1,2,52,103), Species ]),pch=1)#legendtitle(main="DDRTree reduced dimension",col.main="red",font.main=4)dev.off()plot(Y[1,],Y[2,],col= blue ,pch=2)#center of the Ztitle(main="DDRTree smooth principal graphs",col.main="red",font.main=4)get_major_eigenvalue Get the top L eigenvaluesDescriptionGet the top L eigenvaluesUsageget_major_eigenvalue(C,L)6sqdist_RArgumentsC data matrix used for eigendecompositionL number for the top eigenvaluespca_projection_R Compute the PCA projectionDescriptionCompute the PCA projectionUsagepca_projection_R(C,L)ArgumentsC data matrix used for PCA projectionL number for the top principal componentssqdist_R calculate the square distance between a,bDescriptioncalculate the square distance between a,bUsagesqdist_R(a,b)Argumentsa a matrix with D×N dimensionb a matrix with D×N dimensionValuea numeric value for the different between a and bIndexDDRTree,2DDRTree-package(DDRTree),2get_major_eigenvalue,5pca_projection_R,6sqdist_R,67。

getrealmetrics的出处

getrealmetrics的出处

getrealmetrics的出处Step 1: Understanding the ContextBefore delving into the specifics of the origins and significance of "getrealmetrics," it is crucial to comprehend the broader context of the subject. In today's data-driven digital landscape, measuring and analyzing various metrics have become instrumental in decision-making processes for individuals and organizations alike. Entrepreneurs, marketers, and leaders have recognized the value of data as a tool for understanding and improving their activities.Step 2: Introduction to MetricsMetrics, in the context of business and marketing, refer to quantifiable measurements used to assess performance, progress, and success. These measurements are collected and analyzed to gain insights and inform decision-making. Companies and individuals use metrics to monitor key performance indicators (KPIs), identify areas for improvement, set targets, and evaluate outcomes.Step 3: Evolution of Metrics in the Digital AgeAs technology advanced, the proliferation of digital platforms offered new opportunities for data collection and analysis. This led to an increased demand for more accurate and meaningful metrics. In the early days of the internet, metrics were primarily focused on website traffic, page views, and click-through rates. However, as online marketing and advertising expanded, so too did the need for more sophisticated metrics that could measure the impact and effectiveness of these activities.Step 4: The Rise of GetrealmetricsIt is within this evolving landscape that "getrealmetrics" emerged as a concept and a term. While there may not be a specific source or origin for the term, it represents an overarching principle that emphasizes the importance of using accurate, reliable, and relevant metrics in decision-making processes.Step 5: Understanding the Significance of GetrealmetricsThe term "getrealmetrics" encapsulates the idea that businesses should focus on obtaining metrics that truly reflect their goals andobjectives. It emphasizes the need to avoid vanity metrics, which may appear impressive on the surface but do not provide meaningful insights or contribute to the bottom line.Step 6: Differentiating Real Metrics from Vanity MetricsVanity metrics are measurements that may look impressive but lack substance. Traditional vanity metrics include metrics such as the number of social media followers, website views, or app downloads. While these metrics may indicate popularity or awareness, they do not necessarily reflect meaningful engagement or contribute to the intended outcomes of a business or organization.Step 7: Identifying Real MetricsReal metrics, on the other hand, are those that align with an organization's goals and provide actionable insights. These metrics focus on outcomes that impact revenue, customer satisfaction, retention, or other key objectives. Real metrics might include conversion rates, customer lifetime value, return on investment, or customer satisfaction scores.Step 8: Implementing Getrealmetrics ApproachTo implement the getrealmetrics approach effectively, businesses need to identify their specific goals and key performance indicators. They must analyze their target audience, understand what metrics are most relevant to their objectives, and ensure that data collection methods are accurate and reliable.Step 9: Case Studies and ExamplesTo provide a practical understanding of the getrealmetrics approach, case studies and examples can be utilized. These would demonstrate how businesses achieved success by using real metrics to drive decision-making and improve their outcomes.Step 10: ConclusionIn conclusion, getrealmetrics is a concept that emphasizes the importance of using accurate, relevant, and meaningful metrics in decision-making processes. It encourages businesses to avoid vanity metrics and focus on metrics that align with their goals andobjectives. By adopting the getrealmetrics approach, organizations can gain valuable insights, make informed decisions, and drive meaningful outcomes in the digital age.。

会写作业的机器英语翻译

会写作业的机器英语翻译

The machine that can do homework is a marvel of modern technology,designed to assist students in completing their assignments with greater efficiency and accuracy. Heres how it works:1.Text Input:The machine starts by accepting text input from the user.This could be a description of the homework task or a specific question that needs to be answered.nguage Processing:It uses advanced natural language processing NLP algorithms to understand the context and requirements of the homework.3.Knowledge Base:The machine accesses a vast knowledge base that contains information on various subjects,which it uses to generate accurate responses.4.Problem Solving:For math problems or logicbased questions,the machine employs algorithms to solve the problems step by step,providing a clear and concise solution.5.Writing Assistance:For essay writing or report tasks,the machine can generate drafts, outline structures,and even write complete paragraphs based on the given topic and instructions.6.Proofreading and Editing:After the initial draft is generated,the machine checks for grammar,spelling,and punctuation errors,ensuring the homework is polished and professional.7.Citation and Referencing:If the homework requires research and citation,the machine can also assist in generating accurate citations and references in various formats such as APA,MLA,or Chicago.8.Customization:Users can customize the output according to their preferences, including the tone,style,and complexity of the language used in the homework.9.Feedback Integration:The machine can learn from user feedback to improve the quality of future outputs,making it more tailored to individual needs.10.Privacy and Security:It ensures that all user data and homework content are kept private and secure,respecting the confidentiality of the students work.This machine is not just a tool for completing homework but also a learning aid that helps students understand concepts better and improve their writing skills over time.。

男孩正在写作业英语

男孩正在写作业英语

The boy is currently engaged in completing his homework assignments in English. He is focused on the tasks at hand,which may include reading comprehension exercises, vocabulary practice,grammar drills,or writing compositions.His workspace is organized with all the necessary materials such as textbooks,notebooks,pens,and perhaps a dictionary for reference.As he works through the assignments,he carefully reads the questions and instructions, ensuring that he understands what is being asked of him.He takes notes when necessary, jotting down key points or unfamiliar words to look up later.The boy also uses his English language skills to think critically about the content,making connections between what he is learning and his own experiences or prior knowledge.For the reading comprehension sections,he reads the passages attentively,underlining or highlighting important information that will help him answer the questions that follow. He may also annotate the text with his own thoughts and interpretations.When it comes to vocabulary exercises,the boy takes the time to learn new words, understanding their meanings,and how to use them correctly in sentences.He might practice these words in context or create flashcards to help with memorization. Grammar exercises require the boy to apply the rules of English syntax and structure.He might be asked to identify parts of speech,correct sentence errors,or construct sentences using specific grammatical structures.For writing assignments,the boy drafts his compositions,ensuring that they are coherent, wellstructured,and free of grammatical errors.He may brainstorm ideas,outline his thoughts,and then write a first draft.Afterward,he reviews and revises his work,making sure it meets the requirements set by his teacher.Throughout the process,the boy may encounter challenges or difficulties,but he remains persistent,seeking help from teachers,classmates,or online resources when needed.His dedication to his English homework reflects his commitment to improving his language skills and academic performance.。

s4nd方法解析

s4nd方法解析

s4nd方法解析The s4nd method is a complex algorithm used for data analysis and interpretation. It is widely used in various industries, including finance, healthcare, and marketing. s4nd方法是一种复杂的算法,用于数据分析和解释。

它被广泛应用于各个行业,包括金融、医疗保健和营销等。

One of the key features of the s4nd method is its ability to handle large volumes of data. This is particularly important in today's data-driven world, where companies and organizations are dealing with massive amounts of information. s4nd方法的一个关键特点是其处理大量数据的能力。

这在当今数据驱动的世界中尤为重要,公司和组织需要处理海量的信息。

Another advantage of the s4nd method is its versatility. It can be used for various types of analysis, such as trend analysis, pattern recognition, and predictive modeling. This makes it a valuable toolfor decision-making and problem-solving in a wide range of scenarios. s4nd方法的另一个优势是它的多功能性。

任务型教学法 Task-based language teaching

任务型教学法 Task-based language teaching

Tasks are proposed as useful vehicles for applying these principles. Early applications of this approach in Malaysian and Bangalore Projects were short lived, but the role of tasks received further support from research of SLA, without much evidence of success in grammar-focused teaching activities. Engaging learners in task work provides a better context for the activation of learning processes than form-focused activities, and hence ultimately provides better opportunities for LL to take place LL is believed to depend on immersing ss not merely in “comprehensible input” but in tasks that require them to negotiate meaning and engage in naturalistic and meaningful communication.
Approach: Theory of language
• Though TBLT is primarily motivated by a theory of learning, several assumptions about the nature of language underlie approaches to TBLT:

uci数据集大致情况翻译

uci数据集大致情况翻译

uci数据集大致情况翻译来源:/doc/e411396849.html,/ml/datasets.html?form at=&task=&att=&area=&numAtt=&numIns=&type=&sort=nameUp&view=list206 Data Sets Table View List View1. Abalone: Predict the age of abalone from physical measurements鲍鱼DataSet:根据物理度量,预测鲍鱼的年龄。

2. Abscisic Acid Signaling Network: The objective is to determine the set of boolean rules that describe the interactions of the nodes within this plant signaling network. The dataset includes 300 separate boolean pseudodynamic simulations using an asynchronous update scheme.目标是测定布尔值的度量集合,以描述植物的信号网路节点。

该数据集包括了300个独立的布尔值形式的虚拟动态模拟值,使用了异步更新的架构。

3. Acute Inflammations: The data was created by a medical expert as a data set to test the expert system, which will perform the presumptive diagnosis of two diseases of the urinary system.急性炎症DataSet:数据来源于一位医学专家的数据集,用以检测专家系统,可以推断出泌尿系统的两种疾病的诊断结果。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

System of KDD Tasks and Results within theSTULONG ProjectPetr Dolejˇs´ı2,V´a clav L´ın1,2,Jan Rauch1,2and MichalˇSebek21Department of Information and Knowledge Engineering,University of Economics,Prague,W.Churchill Sq.4,13067Praha3,Czech Republic {xdolp15,xlinv05,rauch,xsebm08}@vse.cz2European Centre for Medical Informatics,Statistics and Epidemiology–Cardio University of Economics,Prague,W.Churchill Sq.4,13067Praha3,Czech RepublicAbstract.KDD results acquired on the Challenge data are shown.It isargumented that to gain an insight into the data,it is important to suit-ably structure the already discovered knowledge.The needs of knowledgereuse are further addressed.We propose structuring our WWW presen-tation of discovered knowledge in accord with hierarchical decompositionof KDD tasks within the project.Formal description of KDD tasks is at-tempted.Possible applications of the formal approach in management ofthe discovered knowledge are sketched.1IntroductionThe Challenge data were taken over from the STULONG–a research project3, currently underway at the European Center for Medical Informatics,Statistics and Epidemiology,Prague.We have been participating in its part concerning developement of techniques for KDD.We have been miningassociation rules with the4ft-Miner[7]procedure.As we have dealt with complex data,the number of various analyses that could have been performed on the data has also grown large.We have identified reuse of the discovered knowledge an important issue. To enhance possibilities of reuse of the discovered knowledge,we have structured our project’s WWW presentation in accord with the hierarchy of KDD tasks. We have also attempted to describe the hierarchy of KDD tasks formally.In section2are briefly described the analysed data and introduced several notions concerningassociation rule miningwith the4ft-Miner procedure.Section 3shows some of the results mined in the data.Particularly,section3.5discusses related problems and stresses the need of knowledge reuse.Section4describes how we had adressed this issue,i.e.how the presentation of obtained results is structured.Formalization of KDD tasks is attempted in section5.Section5.5in particular discusses possible utilization of such a formal approach in management of the discovered knowledge.Section6concludes the paper.3http://euromise.vse.cz/challenge/2Attribute Groups and4ft-Association Rules2.1Attribute GroupsWe have been analysinga data matrix containingresults of observation of244 attributes of entry examination of each patient.About1500male patients par-ticipated in the research.Attributes from entry examinations are divided into several groups:physical,social characteristisc,biochemical,alcohol consumption and other.Attributes from control examinations concern mainly changes in pa-tient’s behavior,social status,drugconsumption,etc.For research purposes,pa-tients(i.e.database objects)were divided into several(non-overlapping)groups such as”patients with cardiological risk”,”normal patients”,or”patients with pathologic symptoms”.2.24ft-Association Rules and Conditional Association RulesWe mine for association rules with a procedure called4ft-Miner[7].To distin-guish the4ft-Miner’s association rules from rules that are mined by other data miningsystems,we call our rules4ft-association rules.In this section,we will introduce only the most basic notions,see section5and references([1],[2],[6]) for more information.4ft-association rules are expressions of the formϕ≈ψ.In4ft-association rule ϕ≈ψ,ϕis called antecedent,andψis caled succedent.ϕandψare symbolic names of database objects’properties.Example of such a symbolic name could bediast1(75...85)∧height(170...180),standingfor the property of having the diastolic blood pressure in the range form 75to85and being from170to180cm high.Syntactically,ϕandψare formulae, see section5.1.We will say that a database object satisfies a formula if it has the property denoted by the formula.Intuitive meaningofϕ≈ψis thatϕandψare in a relation given by the sym-bol≈,which is called4ft-quantifier.There are many4ft-quantifiers,each of them correspondingto some kind of relationship in data.Each4ft-quantifier corre-sponds to some condition concerning4ft-table,see Tab1.Association ruleϕ≈ψis evaluated on the basis of the condition correspondingto the4ft-quantifier≈. In4ft-table,a is the number of objects from the analyzed data matrix satisfying bothϕandψ,b is the number of objects satisfyingϕand not satisfyingψ,etc.ψ¬ψϕa b¬ϕc dTab.1-4ft-table ofϕandψB y suitable conditions concerning4ft table can be expressed various types of dependencies ofϕandψ,i.e.there can be various4ft-quantifiers(see also section 5.2).We will introduce two of them.The quantifier ⇒p,s of founded implication [1]for 0<p ≤1and s >0is defined by the condition a a +b ≥p ∧a ≥s .It means that at least 100p per cent of objects satisfying(in the g iven data)ϕsatisfy also ψand that there are at least s objects satisfyingboth ϕand ψ.The above average quantifier 4⇒+p,s for 0<p and s >0is defined by thecondition a a +b ≥a +c a +b +c +d (1+p )∧a ≥s .It means that amongobjects satisfying ϕis the relative frequency of objects satisfying ψat least 100p per cent higher than the relative frequency of objects satisfying ψamong all the objects in database.Let us give two examples of 4ft-association rules from our application.Rulediast 1(80...85)∧weight (75...80)⇒0.5;20chlst (200...240)is interpreted as:”50%of patients with diastolic blood pressure in (80,85)and weight in (75,80),has also cholesterol in (200,240);there is 20of such patients in total.”.Rulediast 1(75...85)∧height (170...180)⇒+0.60;28chlst (180...200)is interpreted as:”Among patients satisfying diast 1(75...85)∧height (170...180),there are at least 60%more patients satisfying chlst (180...200),than there are patients sat-isfying chlst (180...200)in the whole data matrix;there is 28of such patients in total.”It is important to note that above average quantifier is symetric,i.e.the followinginterpretation is equivalent:”Among patients satisfying chlst (180...200);there are at least 60%more pa-tients satisfying diast 1(75...85)∧height (170...180),than there are patients satisfying diast 1(75...85)∧height (170...180)in the whole data matrix;there is 28of such patients in total.”.There are also conditional association rules of the form ϕ≈ψ/ξ,where ξis a symbolic name of a database objects’property.Intuitive meaningof such a rule is that ϕand ψare in relation ≈when formula ξis satisfied.Evaluation of conditional association rules is as described above,except for the fact that frequencies a ,b ,c and d are calculated only for objects satisfying ξ.3Examples of Discovered KnowledgeWe were examiningassociational and implicational relations between the g roup of biochemical attributes and the group of physical attributes.These examina-tions were done within the following patient groups:the normal group,the risk group,the pathologic group.We have used the founded implication quantifier and the above-average quantifier .In all the tasks (Task 1–4):–antecedents were derived from physical attributes:diastolic blood pressure (diast 1),systolic blood pressure (syst 1),body height and weight (height ,weight ),body mass index (bmi ),skinfold thickness at subscapular (subsc ),4See also http://lispminer.vse.cz/overview/4ft_quantifier.htmland skinfold thickness at triceps(tric).In Task4,also some aditional at-tributes were allowed,see3.4–succedents were derived from biochemical attributes:urine(moc),cholesterol (chlst),and triglicerides(trigl)–conditions were used to restrict the computations on the patient group con-cerned by the particular task.We will not display conditions in our example rules below,for typesettingreasons.We will show results from four KDD tasks.Discussion of the results follows in section3.5.3.1Task1First task concerned founded implications(with p>0.5∧s>20)between biochemical and physical attributes within the risk-patient group.There were6 rules found:–height(175...185)∧syst1(130...140)∧weight(80...90)⇒0.56;20chlst(230...270)–diast1(80...90)∧subsc(17,18)⇒0.52;23chlst(240...280)–syst1(120...130)⇒0.51;20chlst(240...280)–diast1(80...85)∧height(170...180)∧syst1(130...140)⇒0.51;21chlst(22...260)–syst1(135...145)⇒0.50;20chlst(220...260)–diast1(80...85)∧weight(75...80)⇒0.5;20chlst(200...240)3.2Task2Second task concerned founded implications(with p>0.6∧s>10)between biochemical and physical attributes within the pathologic-patient group.There was only1rule found:–diast1(85...95)⇒0.63;10chlst(240...280)3.3Task3Third task concerned above-average(with p>0.5∧s>20)relations between biochemical and physical attributes within the normal-patient group.There were 19rules found.We list only10strongest rules:–diast1(80...85)∧bmi(170...180)⇒+0.87;26chlst(180...200)–diast1(85...95)∧height(175...180)⇒+0.85;20chlst(210...240)–diast1(90...100)∧height(175...185)⇒+0.84;21chlst(210...240)–diast1(80...90)∧height(170...180)⇒+0.79;21chlst(190...200)–diast1(85...95)∧height(175...185)⇒+0.73;25chlst(210...240)–syst1(110...120)∧bmi(24...26)⇒+0.69;20chlst(210...250)–syst1(135...145)⇒+0.62;21chlst(200...220)–diast1(80...85)∧syst1(115...125)⇒+0.62;20chlst(160...200)–syst1(110...120)⇒+0.61;22chlst(230...250)–diast1(75...85)∧height(170...180)⇒+0.60;28chlst(180...200)3.4Task4Fourth task was a specialization of the third task in the sense that also non-physical attributes were allowed in antecedent.It were the following:leissure-time activities(aktpozam),alcohol consumption(alkohol),pain in the legs (boldk),and pain in chest(bolhr).There were23rules generated.We will list only the three most interestingrules.–diast1(80...85)∧height(170...180)∧boldk(1)⇒+0.95;24chlst(180...200)–syst1(110...120)∧bmi(24...26)∧boldk(1)⇒+0.83;20chlst(210...250)–diast1(90...100)∧height(175...185)∧boldk(1)⇒+0.81;20chlst(200...240)3.5DiscussionWhat are the questions we might ask after getting the results given in sections 3.1,3.2,3.3,and3.4?Apparently,blood pressure(systolic or diastolic)plays its role in almost all the discovered rules in all the tasks.Therefore,one might want to examine KDD tasks that would focus on blood pressure more closely.Task2 proved only one rule;one might want to know,whether this rule holds exactly only in the pathologic-patient group,or also in other groups.Very interesting are results from the fourth task.The discovered rules are very strong(p>0.80). Also,it appears that the pain-of-the-legs attribute(boldk)plays an important role in connection with cholesterol levels.Again,examining further tasks featurig the boldk attribute might be of interest.Another feature of the partial results is that there was no rule with moc or trigl in succedent generated.It would be interestingtofind out why.Although the questions rising from our examples might seem trivial to a do-main expert;they outline the problem we have been encounteringall throug h our data miningefforts.With complexity of g iven data,the number of interest-inganalyses that can be performed can rang e from several dozens to couple of hundreds.Apparently,the partial results give users only a very limited insight into the data.To gain a greater insight,one has to either perform new analyses,or check whether relevant analyses were not already performed.To gain such an insight, it is necessary to compare results of different analyses,consider value distribu-tions of attributes,sumarize the acquired results,etc.Since it would be grosslyinefficient to re-run already performed analyses,well structured presentation and good organization of discovered knowledge grows in importance,i.e retrieval and reuse of KDD results becomes an issue.4WWW Presentation of KDD Tasks and ResultsTo reuse,share and retrive discovered knowledge,we have been developing a WWW presentation of KDD results5.To faciliate navigation through pages de-scribingparticular analyses and discovered knowledg e,we have structured the presentation in accord with a hierarchical decomposition of KDD tasks within the project.KDD task is generally a verbal explication of some range of research prob-lems,which are to be examined in the given data.Indeed,KDD tasks differ in their generality.General tasks can be broken-up into more specific tasks,these into yet more specific and so on.As mentioned above,attributes in our data set are divided into several groups:physical attributes,biochemical attributes, etc.In our application,the most general KDD tasks concern relations between attribute groups(”Examine associational relations between physical and social characteristics”).On a more specific level,these KDD tasks are relativized-relations between given attribute groups are examined within separate patient’s groups(”Examine associational relations between physical and social characteristisc within the risk patient group”).Further,these relativized tasks are broken-up into examiningparticular types of relations correspondingto quantifier classes(see section5.2)-associations, imlications,double implications,and equivalences(”Examine implicational rela-tions between physical and social characteristisc within the risk patient group”). The last step is formulation of the most specific KDD tasks,like are those in section3.Thus gradually specifying KDD tasks,we arrive from the most general tasks to most specific ones.Structure of the WWW presentation reflects the hierarchy of KDD tasks.The whole presentation can be thought of as consisting of several tree-like structures.A root of one such structure is shown infigure1.It is a web page containing tri-angular table displaying all possible combinations of attribute groups,i.e.fields of the table correspond to the most general KDD tasks6.Internal nodes of the tree structure correspond to more specific KDD tasks and arcs(i.e.html links) correspond to gradual specialization of the tasks.Leaves of the tree correspond to particular analyses and discovered knowledge.Navigation through the system of web pages is easy and natural,and meets the objective of faciliatingknowledg e reuse.5http://euromise.vse.cz/,please note that the web site is still under developement 6E.g.thefield at the intersection of the row”smoking”and column”social”corre-sponds to the task of examining relations between the attribute group”smoking”and attribute group”social characteristics”.The”OK”sign means that the task has already been solvedFig.1.A table at root of the KDD task hierarchy within the STULONG project. 5Formalization of KDD TasksWe are going to describe the hierarchy of KDD tasks formally,i.e.we are going to define a simple formalism capable of describingKDD tasks,and we are g oingto define the specialization relation on KDD tasks.Since the discovered knowledge is itself formal(4ft-association rules),we will be able to describe formally every part of our WWW presentation.5.1FormulaeWe will define formulae as symbolic names of database object’s properties.A database object will satisfy a particular formula,if it will have the property denoted by the formula(these notions were informally used already in section 2.2).Let A be a database attribute,let rng(A)be the value range of the attribute A. Literal is an expression of the form A(α),whereα⊂rng(A)∧α=∅. Formulae7are defined as usually:–every literal A(α)is a formula–ifΦandΨare formulae,then¬Φ,Φ∧ΨandΦ∨Ψare also formulae–there are no other formulaeUsual conventions concerningparentheses apply.Symbols¬,∧and∨are com-mon propositional connectives of negation,conjunction and disjunction;formulae ¬Φ,Φ∧ΨandΦ∨Ψhave their obvious meanings.We say that literal A(α)is true for object o in the given database,if and only if the value of attribute A of the object o belongs to the setα.Given object o and formulaΨ,we will say that o satisfiesΨ,iffΨis true for o.It can be easily seen that every lit-eral A(α),α={a1,a2,...,a n},can be re-written with disjunction of literals A(a1)∨A(a2)∨...∨A(a n).Literals A(a i),a i∈rng(A),are called one-point literals.Expressionsϕ,ψandξof section2.2are obviously formulae–elementary con-junctions of literals8.7in[5]-derived boolean attributes8However,the4ft-Miner also allows for negated literals in conjunctions5.2Classes of4ft-QuantifiersThere have been many4ft-quantifiers defined.Due to their statistical and logicalproperties,they can be divided into several classes[1],[5],[6].Associational quantifiers”∼”in rulesϕ∼ψcorrespond to the intuitive notion ofassocation.Probabilisticaly:P((ψ∧ϕ)∨(¬ψ∧¬ϕ))>P((¬ψ∧ϕ)∨(ψ∧¬ϕ)).Note:the above average quantifier is associational.Implicational quantifiers”=⇒”in rulesϕ=⇒ψcorrespond to the intuitive no-tion that”almost allϕareψ”.In terms of conditional probability:P(ψ|ϕ)>P(ψ). Note:the quantifier of founded implication is implicational.Double implicational quantifiers”⇐⇒”in rulesϕ⇐⇒ψcorrespond to the no-tion that iffϕ=⇒ψthan alsoψ=⇒ϕ.Equivalence quantifiers”≡”in rulesϕ≡ψcorrespond to the notion that iffϕ=⇒ψthan also¬ϕ=⇒¬ψ.Let A,I,DI and E denote classes of associational,implicational,double implica-tional and equivalence4ft-quantifiers,respectively.Followinginclusion relationshold[2]:DI⊆I,E⊆I,I⊆A.5.3Pseudo-RulesKDD tasks discussed above can be formalized usingexpressions similar to4ft-association rules.We will call them pseudo-rules.Pseudo-rule is an expressionof the formΦ≈ Ψ/Ξ,whereΦ,ΨandΞare formulae(in sense of section5.1).Symbol≈ stands for a class of4ft-quantifiers,i.e.≈ ∈{∼,=⇒,⇐⇒,≡}. Pseudo-ruleΦ≈ Ψ/Ξis interpreted as a KDD task:”Within the set of objectsthat satisfyΞ,examine relations of type≈ between objects satisfyingΨandobjects satisfyingΦ”.Let us stress that although syntactically similar to associ-ation rules,pseudo-rules correspond to tasks only,not to actual relations in data. Pseudo-rules are capable of representingvery specific KDD tasks,as well as quite general KDD tasks.Considerϕ≡ψ/ξ,whereϕ,ψ,andξare elementary con-junctions of literals and/or negated literals.Formulaeϕ,ψ,andξare symbolic names of complex characteristics of database objects and≡is a fairly special type of relation(equivalence).On the other hand,considerΦ∼Ψ/Ξ,whereΦ,Ψare disjunctions of formulae andΞis an empty(hence true)formula.In sucha case,ΦandΨare symbolic names of groups of characteristics,relations to be found are not restricted by any condition(Ξis empty),and∼is the most general type of relation(association).Therefore,we can use pseudo-rules to represent KDD tasks all alongtheir hierarchy.5.4Specialization/Generalization of Pseudo-RulesLet us attempt to formalize the notion of specialization/generalization relation between pseudo-rules.We are lookingfor a partial-order relation on pseudo-rules, which would satisfy the followingintuition:Given two KDD tasks,represented by two pseudo-rulesτ1andτ2,τ1=(Φ1≈ 1Ψ1/Ξ1),τ2=(Φ2≈ 2Ψ2/Ξ2);τ2is a specialization ofτ1iff:–sets of objects satisfyingΦ2,Ψ2andΞ2are subsets of sets of objects satisfying Φ1,Ψ1andΞ1,respectively.–≈ 2is a special case of≈ 1(see section5.2)Before we proceed to definition of the specialization relation,we need a no-tion of associated propositional formulae(see also[6]).LetΦbe a formula.Then associated propositional formulaπ(Φ)ofΦis a proposition of propositional cal-culus constructed by:–constructingformulaΦ by replacingall the literals inΦby appropriate disjunctions of one-point literals(see section5.1)–π(Φ)is the same stringof symbols asΦ ,but particular one-point literals are understood as propositional variablesNow we are ready to define the specialization relation.Letτ1andτ2be pseudo-rules,τ1=(Φ1≈ 1Ψ1/Ξ1),τ2=(Φ2≈ 2Ψ2/Ξ2).Let C1and C2be classes of4ft-quantifiers denoted by≈ 1and≈ 2,respectively9.Let→be the proposi-tional connective of implication.We will say thatτ2is a specialization ofτ1, symbolicallyτ1 τ2iff:–propositions(π(Φ2)→π(Φ1)),(π(Ψ2)→π(Ψ1))and(π(Ξ2)→π(Ξ1))are tautologies of propositional calculus–C2⊆C1It can be seen easily that thus defined relation satisfies the above listed intuitions and is transitive,reflexive and antisymetric,i.e.it is a partial ordering.5.5Possible Applications of the Formal ApproachIt is possible to include formal desriptions of web pages’content within the pages’source code[4];therefore,the formalisation of KDD tasks indicated above can be used for efficient content-based on-line retrieval of KDD tasks and of the discovered knowledge[3].For example we can have queries such as:”Given a pseudo-ruleτ1,find all analyses that solve some taskτ2,such thatτ1 τ2”.In a similar way,we could retrieve whole sections of web presentations of KDD tasks and results (i.e.”Givenτ1,find within the WWW presentation’s tree a sub-tree described by τ2,such thatτ1 τ2”.)The formal approach would also allow for automated support of integration of WWW presentations of KDD research projects(i.e. appendingone tree to another;ag ain the relation is utilized).Also,we can make use of logical properties of4ft-rules themselves.In[1] were4ft-association rules developed as expressions of special logical calculi that valuate in the analysed data matrices.Furthermore,deduction was defined on 4ft-association rules.Therefore,given a4ft-association ruleϕ≈ψ,we can ask whether there was an analysis performed,which proved a set of4ft-assocaition rules O,such that O|=ϕ≈ψ(see also[5]).9for example,if≈ is=⇒,then C is the class of imlicational quantifiers6ConclusionsWe have reckognized the need for transparency and reuse of knowledge discov-ered in the Challenge data in course of the STULONG project.These needs determinate the design of WWW presentation of the discovered knowledge.To further enhance possibilities of knowledege reuse,we had developed formal means of description of the underlyingKDD task hierarchy.The research has been supported by projects LN00B107and ZA471011of the Ministry of Education of the Czech Republic.References1.H´a jek,P.-Havr´a nek T.:Mechanising Hypothesis Formation-Mathematical Foun-dations for a General Theory.Springer-Verlag,1978,396p.Available online at http://www.cs.cas.cz/vvvvedci/hajek/guhabook2.Iv´a nek,J.:On the Correspondence between Classes of Implicational and EquivalenceQuantifiers.In Principles of Data Mining and Knowledge Discovery.Red.Zytkow, J.Rauch,J.Berlin,Springer Verlag1999,pp.116-1243.L´ın,V.,Rauch,J.,Sv´a tek,V.:Content-based Retrieval of Analytic Reports.In:(Schroeder M.,Wagner G.,eds.)International Workshop on Rule Markup Lan-guages for Business Rules on the Semantic Web.Sardinia,Italy,June2002,p.219-224.ssila,O.,Swick,R.R.:Resource Description Framework(RDF)Model and SyntaxSpecification.Recomendation REC-rdf-syntax-19990222,W3C,February19995.Rauch,J.:Logical Calculi for Knowledge Discovery in Databases.In Principles ofData Mining and Knowledge Discovery,Springer-Verlag,1997.6.Rauch,J.:Classes of Four-Fold Table Quantifiers.In Principles of Data Mining andKnowledge Discovery,(J.Zytkow,M.Quafafou,eds.),Springer-Verlag,1998.7.Rauch,J.-ˇSim˚unek M.:Mining for4ft Association Rules by4ft-Miner.in:INAP2001,The Proceeding of the International Conference On Applications of Prolog.Prolog Association of Japan,Tokyo October2001,pp.285-294。

相关文档
最新文档