Ontology Learning for the Semantic Web
飞禽宠物养育
词语间的二元关系
原子作用atomic role, 例如养育环境depend(_,_) 逆作用inverse role, 环境所养育depend (_,_) 其他, 复合二元关系R◎S等
例1, ‘文件系统’的 ontology (基于DL逻辑)
文件系统ontology 的公理陈述
Characteristics of Web Ontology
Ontology Web Ontology
step1
用途不可限量 Ontology 的构建 DL描述逻辑,例子
step2
step3
A Web of Ontologies 主要挑战
references
Ontology
Ontology 本来是,哲学中研究 ‘being’的性 质及其内在关系的理论;ontology 是一种元理
Dangdang google
An Ontology 圈定一个知识范围
圈定知识的应用范围
圈出抽象的知识层次
应用领域的知识范围, 如 微电子业、或者 汽车
业;
应用领域所涉及的常用 词语,如 尺寸、热耗、 速度等属性词语,从语 义上深一步探讨,它们 需要在更基本的抽象级 别上定义,为此,需要 在更抽象一级的层面上 圈出新的知识范围,
作用 role 二元谓词R 原子作用atomic role 逆作用inverse role ……
具体对象及其属性取值的描述
assert. axiom
Tbox Abox
一组概念的定义陈述
一组具体对象及其属性 取值的描述
本体研究内容概述(原创)
3 本体应用研究 (1)本体检索/浏览(Ontology Retrieval,Ontology 本体检索/浏览(Ontology Retrieval, Search, Search,Ontology Navigation) 语义网本体浏览(navigation, surfing)、 语义网本体浏览(navigation, surfing)、检索 (search)六种模型: (search)六种模型: 六种模型 语义网搜索引擎(Semantic Web search engines) 语义网搜索引擎(Semantic 语义网浏览工具(navigation 语义网浏览工具(navigation tools) 语义网知识库(Semantic 语义网知识库(Semantic Web repositories) 收集器(specialized RDF 收集器(specialized RDF data collections) 本体标注检索模型(ontology 本体标注检索模型(ontology annotation) 领域本体检索模型
当本体库某一个部分变动的时候,其他部分如何随之自动变动( 当本体库某一个部分变动的时候,其他部分如何随之自动变动(如 澳门那个项目就可以用这个解决) 还包括逻辑的不一致, 澳门那个项目就可以用这个解决), 还包括逻辑的不一致,或者逻辑冲 突等问题(但是又不是通过本体推理方法解决)。 突等问题(但是又不是通过本体推理方法解决)。
基于关系数据库数据的自动化/ ● 基于关系数据库数据的自动化/半自动化本体构建
(3)本体映射/整合(Ontology Mapping、Ontology 本体映射/整合(Ontology Mapping、 Alignment、 Merging、 Alignment、Ontology Merging、Semantic Integration and Interoperability)
关于ontology的讨论
关于ontology 的讨论董云卫:你问道关于ontology, 直译是哲学上的存在论或本体论,现在用于系统的概念模型很热。
大体意思是说,客观世界是由很多元素组成,而元素之间又具有各种联系,把这些元素和关系画出来就是一个ontology。
这里有几篇文章可供参考,都是2002年国际会议的文章,比较新。
1,25030001 Conceptual Modeling and Ontology: Possibilities andPitfalls2,25030003 An Ontology for m-Business Models3, 25030012 Ontology-Driven Conceptual Modeling: Advanced Concepts4, 25120174 DAML+OIL:A Reason-Able Web Ontology Language郝克刚。
2003年1月15日Sent: Wednesday, January 15, 2003 8:49 PMSubject: RE: 关于ontology郝老师及各位:The 10th international human-computer interaction conference (2003)刚接受了我一篇文章:Ontology-based Conceptual Modeling for InteractionDesign. 创新和质量都得了满分:-) 事实上,其内容是讨论软件系统的概念建模的,但和建模交互有一定关系。
我和董云卫争论过2小时,但他不相信我的。
本体论的常用定义是:分享概念化的形式、显式规约(但有争论),其内容包括一个概念分类,关系及公理。
本体论一般是静态的,不包括动态概念。
换言之,本体论描述的是说明式知识,不包括过程序知识,因为本体论的目的是表示,不是使用知识。
所谓分享概念化指是在一个问题域中现象的抽象模型,其中概念是公认的,形式化指机器可处理性,显式指概念的类型和使用限制都是明确定义的(一般得有一个meta-ontology或叫ontology assumptions定义概念类型和类型之间的关系,一个具体的概念模型中的概念及关系是它的实例)。
教育语义网中的知识领域本体建模
第43卷第9期2009年9月浙 江 大 学 学 报(工学版)Journal of Zhejiang U niver sity (Eng ineering Science)Vol.43No.9Sep.2009收稿日期:2008-06-02.浙江大学学报(工学版)网址:w w w.journals.z /eng基金项目:国家 十一五 重大科技攻关资助项目(2001BA101A08 03).作者简介:欧阳杨(1982-),女,湖南长沙人,博士生,从事教育语义网和本体建模研究.E mail:oyy.lily@gm 通信联系人:朱淼良,男,教授,博导.E mail:zhu m@DOI:10.3785/j.issn.1008 973X.2009.09.008教育语义网中的知识领域本体建模欧阳杨,陈宇峰,陈溪源,朱淼良(浙江大学计算机科学与技术学院,浙江杭州310027)摘 要:针对多学科领域知识本体构建难度大和难以普及的问题,提出了适用于教育领域的基于知识工程的本体建模方法,通过确定领域知识的范围,对领域知识进行概念和术语的提取,然后基于分类后的概念集定义层次结构和构建关系模型,构建出完整的本体结构模型.该方法不仅简化了本体建模的过程,还可以使学科领域专家能够独立地开发出学科课程相关的本体.以 统一建模语言(U M L) 课程为实例,示范了课程本体开发的过程,验证了上述学科领域知识本体开发方法的可行性.关键词:语义网;本体;领域知识;本体建模中图分类号:T O 182;T P 393 文献标志码:A 文章编号:1008-973X(2009)09-1591-06Ontology modeling of domain knowledge in semantic learning WebOU YANG Yang,CHEN Yu feng ,CH EN Xi yuan,ZHU M iao liang(College of Comp uter S cience and T echnology ,Zhej iang Univer sity ,H angz ho u 310027,China)Abstract:A methodo logy of onto logy modeling on educational do main based on know ledge eng ineering w as pro posed to so lve the pr oblem of difficulty to construct domain know ledge ontolog y fo r multi subjects.A com plete ontolog y constr uctional model w as built by defining domain know ledge scope,ex tracting concepts fro m dom ain know ledge,defining taxo nom y and co nstructing relationship mo del based on concepts agg re g ation.The process to build domain o ntolo gy w as sim plified and domain ex perts w ere enabled to develop course onto logy auto no mously.A use case of onto logy m odeling for unified mo deling language (UM L)course w as introduced and discussed,and the feasibility of the m ethodo logy w as ev aluated.Key words:sem antic Web;onto logy ;domain kno w ledge;onto logy mo deling Bemers Lee 等[1]提出了语义Web (sem antic Web)的概念, 语义Web 是现有Web 的扩展,信息被赋予定义良好的含义,更便于计算机和人的协同. 语义Web 实现了机器自动处理信息,提供了诸如信息代理、检索代理、信息过滤等智能服务[2].语义Web 实现中最为重要的关键技术是本体(onto lo g y),本体为语义Web 提供了一套共享的术语和信息表示结构,多数据源上的异构信息通过共享的术语和信息表示结构可以成为同构的信息,从而使语义网上的通讯和互操作成为可能.Stutt 等[3]首次提出了教育语义Web (semanticlearning Web)的概念,指出在教育语义Web 环境中,利用本体将原有教育资源中的概念描述抽象成领域知识是一个核心任务.在给定的知识领域中,本体定义了描述和代表该领域知识的词汇,包括了可被计算机识别的基本概念的定义以及它们之间的关系[4].本体技术被越来越多的学者应用于教育技术领域[5 6],Sampson 等[7]提到本体支持从由多个学科领域本体表示的知识域中构建学习内容,本体中对知识概念和关系的描述可以使得学习内容的制定更加规范和清晰,学习内容通过本体中的学习主题、指令以及用户知识域进行描述后可以达到被重用的目的.本文研究了本体建设的研究现状,针对教育领域学科知识的特点,提出了学科知识本体的开发方法,并给出了统一建模语言(UML)课程的本体构建示例.1 本体的概念和方法论1.1 本体的概念本体的概念还没有一个统一的定义,在人工智能界,Neches等[2]最早给出了本体定义,将本体定义为 给出构成相关领域词汇的基本术语和关系,以及利用这些术语和关系构成的规定这些词汇外延的规则的定义 .Neches等[2]认为: 本体定义了组成主题领域的词汇表的基本术语及其关系,以及结合这些术语和关系来定义词汇表外延的规则. 本体是构建知识领域的重要组成部分,是对某领域应用本体论方法分析、建模的结果,即把现实世界中的某个领域抽象为一组概念及概念之间关系的规范化描述,勾画出这一领域的基本知识体系,为领域知识的描述提供术语.本体定义了描述和象征某个知识领域的词汇,它不仅以机器可读的形式定义了该领域中的基本概念,同时还涵盖了这些概念之间的关系模型.而一个领域本体通常定义了一个特殊主题的概念及其之间的关系,而不仅仅是一些通用的普通概念.Guarino[8]通过领域依赖度将本体细分为顶级(toplev el)、领域(domain)、任务(task)和应用(ap plication)本体4类:1)顶级本体,描述的是最普通的概念及概念之间的关系,如空间、时间、事件、行为等,与具体的应用无关,其他种类的本体都是该类本体的特例.2)领域本体,描述的是特定领域(例如医学、汽车、食品等)中的概念及概念之间的关系.3)任务本体,描述的是特定任务或行为中的概念及概念之间的关系.4)应用本体,描述的是依赖于特定领域和任务的概念及概念之间的关系.本文研究的教育领域学科本体属于领域本体的范畴.1.2 本体建设的方法论目前已经开发出了很多本体,出于对各自问题域和具体工程的考虑,构造本体的过程各不相同.很多研究人员出于指导人们构造本体的目的,提出了有益于构造本体的标准,例如Guar ino[8]根据研究项目T OVE(To ronto virtual enter prise)总结出来的TOVE本体开发方法.该方法的目的是构造企业本体,主要为企业的应用软件提供共享的术语,同时用一阶谓词逻辑为每个术语进行定义,用一组Prolog公理来实现本体语义约束,它还定义了一套符号对术语和概念进行图形化的描述.Lo pez等[9]提出了M ETH ONT OLOGY开发方法,与软件工程的开发方法类似,将本体开发过程分为规格说明、知识获取、概念化、集成、实现、评价和形成文档等步骤.Knig ht等[10]提出了KACTU S方法开发本体,采用工程的模式通过概念建模语言(conceptual m odeling language,CML)表达,构造支持产品知识重用的本体,支持计算机集成制造方法和知识工程方法的集成.由ISI信息科学研究所自然语言组开发的SEN SU S本体[11]主要为机器翻译提供广泛的概念结构,目前主要用于军事领域的本体中. U scho ld等[12]提出了骨架法,该方法建立在企业本体基础之上,是相关企业间术语和定义的集合,只提供开发本体的指导方针.本体的开发需要领域专家和计算机科学领域的本体工程师的共同参与,这使得本体开发难度加大,难以普及.知识的多样化导致不同学科领域的知识(包括概念、概念之间的关系等)各有特点,而同一领域的知识又存在不同的粒度和难度层次,因此如何在不同学科领域本体的构建中找到高效可行的建模方法是研究的重点.2 学科知识本体开发方法论本体的结构(onto logy structure)是一个五元组O:={C,R,H C,R el,A O}.其中,C和R是两个不相交的集合,C中的元素称为概念(concept),R中的元素称为关系(relation);H C表示概念层次,即概念间的分类关系(taxonomy relation);R el表示概念间的非分类关系(non tax onomy relation);A O表示本体公理(axiom)[13].从本体的结构可以看出,本体学习的任务包括概念的获取、概念间关系(包括分类关系和非分类关系)的获取和公理的获取.这3种本体学习对象构成了从简单到复杂的层次.从本体结构的定义可以看到,概念和关系是最基本的两个元素,因此构建本体最重要的工作是从知识领域中提取出概念,对概念进行层次划分,构建概念间的关系模型.Noy等[14]提出了知识工程方法,该方法通过7个步骤完成本体的开发:1)确定本体的领域范围和使用目的;2)重用已有的本体;3)穷举该本体中的重要词汇;4)定义类和类的层次结构;5)定义类的属性;6)定义类属性的值域;7)创建实例.在该方法中, 4)~6)通常需要同时进行,相辅相成,如何将已有的词汇区分是否是类或者类的属性是一项复杂的工作.基于本体的结构,本文在知识工程方法的基础上1592浙 江 大 学 学 报(工学版) 第43卷提出了关系模型的概念,增加了构建关系模型的步骤,通过构建关系模型可以更加方便地构建知识空间拓扑结构.2.1 确定本体的知识领域开发本体的第一步是要确定该本体的领域范围和使用目的,可以通过了解本体涵盖的领域,如何使用以及使用对象来确定要创建的本体的学科分类、应用背景、使用对象.例如计算机学科领域,IEEE/ ACM公布了Com puting Cur ricula研究成果CC2001[15],该报告涵盖了计算科学领域的本科教学课程示范结构.在CC2001中,计算机学科被划分为14个子领域:离散结构、程序设计原理、算法与数据结构、程序设计基础、操作系统、人机交互、图形学、智能系统、信息系统,网络计算、软件工程、计算科学、社会和职业问题.对于同一个教学主题,有不同的应用背景和使用对象,在CC2001中介绍了3种教学难度:介绍性(intr oductor y)、中等(interme dia)、高级(adv anced),分别面对不同年级不同学习能力的学生.2.2 重用已经存在的本体在开发新的本体前,可以从目前在进行或者已完成的相关工作中学习,并且从已有的资源中进行提取和扩充[14].在已有的基础上进行改进比创建新的本体要容易的多.当我们的系统需要同其他已经使用某一特定本体或者词汇库的应用程序进行交互时,重用已有的本体就变得非常重要.目前在网络上已经有不少成熟的本体资源可以使用,例如Onto lingua本体库、DAM L本体库、Wo rdNet,同时还有很多公开的商业性质的本体资源,例如UNSPSC、RosettaNet、DMOZ等.2.3 领域知识概念提取,定义层次结构和关系模型领域知识概念提取主要列出本领域中最基本、最有代表性的术语,那些需要被用户了解和学习的概念以及需要注释和解释的词汇.需要指出的是,在这个步骤中只需要穷举出所有可能的术语,不需要去考虑它们的概念是否重叠,也不要考虑它们之间的关系和属性.接下来对提取的概念按照分类学的观点进行组织,形成系统的分类结构.构建分类结构要遵循一些基本的原则,如:满足分类学基本原则,明确父概念和兄弟概念的区别与联系,满足必要的语义约束等[16].定义概念的层级结构有许多方法[10],最常用的有2种:1)自上而下的方法.该方法从定义知识领域中最常规的概念开始,接着定义更为特殊的概念.2)自下而上的方法.该方法从最详细最底层的概念开始定义,即层次结构的叶子部分,然后将这部分的概念组合成更概括更上层的概念.然后定义概念间的关系模型,包括分类关系和非分类关系.通过定义关系模型,能将更多的概念联系起来,扩展知识空间的拓扑结构.将各个步骤结合起来,定义完整的本体结构模型,包括定义类的属性、取值范围、类之间的关联等.2.4 分析、改进和评价改进事实上是构建过程的一个组成部分,在构建的过程中不断改进原有的结构,在不断改进的过程中构建起整体的结构.改进的方法包括合并、编辑及自然语言处理的一些方法.在改进的过程中要注意分类系统整体的一致性.对本体进行分析和评价,确定本体结构是否能准确反映事物的本质和联系,本体系统是否一致、相对完备、无冗余等.分析评价与改进共同构成了本体的维护过程.3 U M L课程本体开发示例3.1 本体的领域和范围本文选取UM L作为范例.U ML是一种建模语言,是用来对面向对象开发系统的产品进行说明、可视化和编制文档的方法.确定领域后,需要定义该本体的使用对象.在CC2001中定义了课程的3种层次:introductory、interm ediate、advanced.开发计算机相关的介绍性课程主要是要完成以下几个目标:1)给学生介绍一系列相关的计算机科学基本概念;2)帮助学生在基本概念的基础上构建认知模型;3)鼓励学生学习相关的技术将概念化的知识模型运用起来.本研究开发的UM L本体面向的是基础性介绍课程.确定好本体的领域和范围后,需要选取建设本体的知识来源,本文选取了多本U ML经典书籍作为知识获取来源,例如Ivar Jacobson、Grady Booch、Jam es Rum baugh3位U ML创始人编写的 统一软件开发过程 ,M artin Fow ler编写的 U ML精粹:标准对象建模语言简明指南 (UML distilled:a br ief guide to the standard obj ect modeling lan guage),Jam es Rum baugh等编著的 UM L参考手册 UM L与面向对象设计影印丛书 .选取经典书籍作为课程本体开发的来源有许多可取之处[17],作为入门和介绍性的教材是一门课程起步的良好选择,它涵盖了该领域基础的概念和相关知识,提供了基础和详尽的解释.本文的目的是给出建设课程本体1593第9期欧阳杨,等:教育语义网中的知识领域本体建模的示范,不使用已经存在的本体库信息(见2 2节).3.2 课程本体建模的系统方法3.2 1 概念提取,构建类层次结构 首先从选取的教材中收集和提取该本体中将包含的各种相关概念和词汇,并且同领域专家进行讨论、筛选.这一步骤是整个本体建模的基础,采用穷举的方式,尽量多地列举出所需要的基本概念和词汇集.表1示范了列举出的部分词汇集.然后采用自上而下的方法对穷举出的词汇进行分类和分层,定义类的属性和约束.首先提取最通用最基本的词汇作为基本类,例如图、视图、元素等,然后再向更详细更底层次的词汇进行划分,定义相应的子类、属性、约束等.分类关系一般用 Is a 来描述.图1展示了模型元素概念的层次结构模型.表1 UML本体概念术语提取表T ab.1 Co ncepts in U M L ontolog y知识分类概念和术语类图类,系统静态结构,关联,泛化,依赖关系,实现,接口序列图交互,对象,消息,激活用例图用例,参与者,系统功能,关联,扩展,包括,用例泛化对象图范例,动态,协作协作图协作,教育,角色,消息组件视图组件图,模块,依赖关系并发视图并发,动态图,状态图,序列图,协作图,活动图,执行图,组件图,展开图逻辑视图静态结构,类图,对象图,动态行为,状态图,序列图,协作图和活动图结构模型元素用例,接口,角色,类,结点,动作,组件,二进制组件,可执行组件,源组件行为模型元素决策,消息,对象,状态组织模型元素包注解模型元素笔记,约束3.2 2 定义关系模型 很显然, Is a 关系仅仅代表了一种简单的分类关系,代表了父子概念关系.而本体模型中类之间的关系比分类关系更为复杂和多样化.在IEEE学习对象元数据(IEEE learning ob jects metadata,LOM)规范中定义了learning object 之间的关系模型[18],如表2所示.类似地,在面向对象建模中定义了关系的几种表2 IEEE学习对象元数据中的关系定义T ab.2 R elatio ns defined in IEEE LO MIEEELOM关系名称对应OOP中的关系关系类型Ispartof;haspart;is versionof;has version;is formatof;hasformat;refereces;is referencedby;is bas edon;isbasisfor;requires;isrequir edbyAggregate(聚合)Dependency(依赖)Generali zation(通用化)Association(关联)图1 模型元素概念层次结构模型F ig.1 H ier archy mo del of modeling elements类型,例如关联、聚合、依赖、通用化,也可以在LOM中找到对应的关系映射.这些关系类型是基于Dublin Core标准而定义的,它们通常以成对的形式出现,便于创建两个学习对象之间单向的关系,因此关系都是有向的.知识的多样性和复杂性使得不同学科不同课程具有各自独特的关系模型,因此将关系模型分为基本关系模型和特殊关系模型.基本关系模型涵盖了知识空间一些基本的连接关系,而特殊关系模型则是针对具体的学科具体的课程所具备的特殊的关系而构建的.关系通常由动词、介词或者词组定义而成,表3描述了UM L本体中定义的关系模型.在定义可使用的关系模型后,可以构建更丰富更完善的本体.图2展示了U ML本体的使用多种关系模型的一部分.3.2 3 本体模型的验证和维护 本体的建模过程是一个不断循环改进优化的过程,对本体模型的评价和验证也穿插在整个本体开发的过程中.评价和验证主要从以下2个方面进行:1)与领域的专家和研究人员进行讨论,针对本体中的知识表达部分验证词汇的正确性、准确性以及类之间关系的可行性.2)与知识工程相关的研究学者讨论本体模型的方法可行性以及模型本身的正确性,包括一致性、完整性.另外用来作为本体建模的经典书籍以及广泛的网络资源都可以用来对本体进行验证和完善.对本体模型的验证包括对类的划分和定义、类层次结构模型、以及类的属性和关系模型的验证.1594浙 江 大 学 学 报(工学版) 第43卷表3 UML 本体的关系定义T ab.3 Relat ions in U M L Ontolog y关系对应的反转关系关系定义U M L 本体基本关系模型特殊关系HasSubty pe Is a 所描述的概念拥有的子概念.同 Is a 一样,该关系是用来构建层次结构的.HasPar tIsPar tOf 所描述的资源在物理上或逻辑上包含的其他资源Co nsistO f 与H asPart 类似,用于表示整体由部分组成的关系.与H asPar t 不同的是,Co nsist Of 通常表示一个概念由若干个相同的其他概念构成.IsBasisOf BasedO n 所描述的概念构成了另一个概念的理论基础.Oppo siteOf 所描述的概念是另一个概念的对立.此关系为双向.SpecializeO f 所描述的概念是另一个概念的特殊化形式.Ex ampleO f所描述的概念是另一个概念的例子.Descr ibedBy Descr ibes 表示的是一个概念由另一个概念描述.HasPr operty所描述的概念具有另一个概念描述的属性.HasRelationO f 这个关系是U M L 本体的特有关系,U M L 包含了 关系 这一类,用来表示各种模型元素之间的各类关系.所描述的概念具有某种关系类型.Car ry Out这个关系是U M L 本体中的特有关系.描述 执行 这个动作.图2 UML 本体模型示例Fig.2 U M L ontolog y mo del1)检查类的层次结构是否合理.类的层次结构主要由 Is a 关系表示,代表父子关系,包括是否存在环、同一层的相邻概念是否粒度相同.2)检查新的类的产生是否合理.例如某一概念是应该为一个新的类,还是某一个类的属性值,或者某一概念应该为类或者是类的实例.3)检查类的关系模型是否合理,是否存在冗余.关系的定义应遵循精简合理的原则,只需要能充分准确地描述类之间的关系,不需要复杂的附加的冗余关系.对本体模型的维护主要包括原有本体的可扩展性和可移植性.随着信息的飞速发展和知识的不断扩充,知识领域也不断的扩充和完善,更多的新知识被引入和学习,原有的知识结构体系也将被修改,因此原有的本体模型需要不断地添加新的词汇概念,同时对已有的类结构和关系模型也需要进行调整和改进,这就需要对本体模型中类的提取和划分、以及类层次结构有比较高的要求,能够方便地进行增添1595第9期欧阳杨,等:教育语义网中的知识领域本体建模和修改.例如UM L已经在原有的版本上相继开发出了新的版本:UM L1 3在原来U ML1 1和UM L1 2的基础上增加了对活动图的说明;在UM L1 1中定义了2个用例关系 使用(uses)和扩充(extends),它们都是通用化关系的衍型(stere o ty pe),而1 3版本则提供了3种关系:包含(in clude)(依赖关系的衍型,替代了 使用 原来的用途)、用例通用化和扩充.4 结 语本文提出了适用于教育领域知识本体建模的方法,通过将各个学科领域的知识用本体建模的方式进行组织和构架,可以将各种学习资源同领域知识关联起来,并且进行更高层次的应用扩展.以 U ML 建模语言 这门课程作为示例,演示了建模的过程,验证了该建模方法.该方法还可推广到其他领域的建设中,例如:1)用户本体,主要描述使用各种学习资源的主体对象的各种属性;2)教育服务提供者本体,主要描述提供各种教育服务的开发方的相关属性;3)学习平台本体,主要描述各种已经存在的学习平台的相关属性;4)学习行为本体,主要描述各种与学习相关的行为的属性.未来的研究工作包括学科本体建模方法的完善和在更多学科中的应用,同时将在已建立的学科本体上进行学习资源个性化系统的开发.参考文献(References):[1]BEMERS LEE T,HENDLER J,LASSILA O.T he semanticWeb[J].S cientific A m erican,2001,284(5):34-43.[2]N ECH ES R,F IK ES R E,GRU BER T R,et al.Enabling techno log y for know ledge shar ing[J].AIMagazine, 1991,12(3):36-56.[3]ST UT T A,M OT TA E.Semantic learning Webs[J].J ournalo f Interactive M edia in Education,2004(10):1-32.[4]WA L LER J C,F OST ER N.T raining via the w eb:avir tual instrument[J].C omputers and Education,2000, 35(2):161-167.[5]SANT OS J M,ANIDO L,L LA M AS M.Design ofsemantic web based brokerag e architecture for the E learn ing domain:a proposal for a suitable ontolog y[C] Pro ceedings o f the35th IEEE Frontiers in Education C onfer ence.N ew York:IEEE,2005,S3H:18-23.[6]M IZOG U CHI R,BO U RDEA U J.U sing ontolog icalengineering to ov ercome co mmon AI_ED pr oblems[J].International Journal of Artificial Intelligence in Educa tion,2000,11(2):107-121.[7]SA M P SO N D G,L YT R AS M D,W AG N ER G,et al.O ntolog ies and the semantic w eb fo r E learning[J].Journal of Educational Technology and Society,2004, 7(4):26-142.[8]G U A RIN O N.Semantic matching:fo rmal onto log icaldistinctio ns fo r info rmatio n o rg anizat ion,ex traction and int eg rat ion[C] Lecture Notes in C omputer Science.Berlin:Spring er,2006:139-170.[9]LOPEZ M F,GOM EZ PEREZ A,SIERRA J P,et a1.Building a chemical o ntolog y using M ET HON T OLOGY and the ontology design environment[J].IEEE Intellig ent S ystems and Their A pplications,1999,14(1):37-46. [10]K NIGHT K,CHA NDER I,HA INES M,et al.Fillingknow ledge gaps in a bro ad coverage MT system[C] Proceedings of International J oint C onference on A rtif icia l Intellig ence.M ontreal:Q uebec,1995:1390-1397. [11]K NIG HT K,WH IT N EY R.O nto lo gy cr eat ion anduse:sensus[EB/O L].[1997 08 28].ht tp://ww /natural lang uag e/r eso ur ces/sen sus.html.[12]U SCHO L D M,K IN G M.T o war ds a met ho do log y forbuilding o nt olo gies[C] Proceedings of the Workshop on Basic Ontological Issues in Knowledge Sharing,Inter national Joint Conference on Artificial Intelligence.M ontr eal:Wilson,1995:25-34.[13]杜小勇,李曼,王珊.本体学习研究综述[J].软件学报,2006,17(9):1837-1847.DU X iao y ong,L I M an,W A NG Shan.A sur vey on onto log y learning research[J].Journal of Software,2006,17(9):1837-1847.[14]NOY N F,M CGU INN ESS D L.Ontolog y development101:guide to creating your first ontology[R].Stanford:Stanford University,2001.[15]A CM,f inal repo rt of the joint A CM/IEEE CS taskfor ce o n comput ing cur ricula2001for comput er science [EB/OL].[2001 12 15].htt p://ww w.acm.o rg/edu cation/curr icula.html#cc2001 fr.[16]顾芳.多学科领域本体设计方法的研究[D].北京:中国科学院研究生院,2004:14-16.GU Fang.Research o n design of multi disciplinar y Onto log y system[D].Beijing:Institute o f Computing Techno log y Chinese A cademy of Sciences,2004:14-16.[17]BO YCE S,P AH L C.Developing do main o nto lo gies forcourse content[J].Educational Technology and Socie ty,2007,10(3):275-288.[18]IEEE LO M,draft st andard fo r lear ning o bject metadata[EB/OL].[2002 07 15].htt p: ltsc.ieee.o rg/w g12/ 20020612 F inal L OM Draft.html.1596浙 江 大 学 学 报(工学版) 第43卷。
基于知识图谱的知识管理研究进展共3篇
基于知识图谱的知识管理研究进展共3篇基于知识图谱的知识管理研究进展1随着信息技术的高速发展,知识管理已经成为企业内部进行创新、提高竞争力的重要手段。
而传统的知识管理方式,如单纯的文档、数据库等已经无法满足实际需求。
然而,基于知识图谱的知识管理方式的出现,为企业的知识管理带来了新思路和新动力。
知识图谱,是利用语义网技术构建出的一种由节点和边表示实体及其关系的图形化结构。
在知识图谱的基础上,建立起来的知识管理系统不仅仅可以将企业内部的知识进行有效的存储、组织、共享和普及,更可以为企业提供全方位的决策支持和业务分析。
因此,基于知识图谱的知识管理方式已经成为管理学、信息学、计算机科学等领域的研究热点之一。
目前,基于知识图谱的知识管理研究已经取得了一些关键性的进展。
具体表现在以下几个方面:一、知识表示与建模——知识图谱的核心之一。
知识建模是知识图谱的先决条件,它是将企业内部的知识进行公共表达和组织的基础。
目前,已经有一些机构和研究者提出了基于本体论、自然语言分析、机器学习等技术进行知识建模的方法,利用这些方法后,可以将企业内部各类知识进行精准地描述和表达。
二、知识融合与共建——利用知识图谱实现知识的协同管理。
如何在企业内部进行知识融合、共建,是基于知识图谱的知识管理的一个重点问题。
为了解决这个问题,现有的研究者提出了基于本体论和协同过滤等技术进行知识融合、共建的方法。
通过这些方法,不同业务领域、不同部门的员工可以通过知识图谱实现知识的协同管理和共享,从而提高了知识的质量和效率。
三、知识挖掘与发掘——利用知识图谱实现知识的自动化发现。
由于企业内部的知识不断增长,企业内部的专家也很难一一发现和进行有效的利用。
因此,利用知识图谱实现知识的自动化发现,已经成为基于知识图谱的知识管理的研究热点之一。
现有的研究者针对这个问题提出了基于图卷积神经网络、基于分层聚类等技术进行知识发掘的方法。
利用这些方法后,企业可以发现和挖掘潜在的、重要的知识,从而促进企业的创新和发展。
swc分类
swc分类SWC分类(Semantic Web Challenge Classification)是用于描述语义网挑战赛中的分类方法。
语义网挑战赛是一个旨在推动语义网技术发展的国际性比赛,每年举办一次,参赛者通过设计和实现语义网应用来展示他们的创新和技术实力。
SWC分类主要是为了方便对参赛作品进行评价和比较。
不同的参赛作品可能涉及不同的语义网技术和应用领域,因此需要将它们进行分类,以便更好地进行评估和评判。
SWC分类主要根据参赛作品的应用领域、数据来源、技术方法等因素进行分类。
下面将介绍几个常见的SWC分类。
1. Ontology-based Applications(基于本体的应用)这类参赛作品主要使用本体来描述领域知识,并利用本体进行数据的整合、推理和查询等操作。
常见的应用包括本体构建、本体匹配和本体推理等。
2. Linked Data Applications(链接数据应用)这类参赛作品主要利用链接数据的方式来整合和共享分布在不同数据源上的数据。
参赛者需要设计和实现链接数据的模型、查询和可视化等功能。
3. Semantic Web Services(语义网服务)这类参赛作品主要关注语义网服务的发现、组合和执行等问题。
参赛者需要设计和实现语义网服务的描述、匹配和调用等功能。
4. Semantic Web and Machine Learning(语义网与机器学习)这类参赛作品主要结合语义网和机器学习的技术,用于解决数据挖掘、信息检索和推荐等问题。
参赛者需要设计和实现语义网与机器学习的集成方法和算法。
5. Semantic Web and Natural Language Processing(语义网与自然语言处理)这类参赛作品主要关注语义网和自然语言处理的结合应用。
参赛者需要设计和实现基于语义网的自然语言理解、问答系统和文本分析等功能。
6. Semantic Web and Internet of Things(语义网与物联网)这类参赛作品主要关注语义网和物联网的融合应用。
语义web中的本体学习OntologyLearningfortheSemanticWeb
2.3 数据的导入和处理技术
文档的收集、导入和处理步骤 使用一个以本体为中心的文档爬虫来搜集网上 的相关文档。 使用自然语言处理技术来进行文档的处理。 使用一个文档包装器将半结构化文档(如领域 字典)转换成本体学习框架可以识别的格式 (如RDF格式)。 将处理过的文档转换为本体学习算法可以识别 的格式。
抽取词条
分类关系的抽取:(1)使用层次聚类技术(2)
使用模式匹配技术(字典)
非分类关系的抽取:使用基于关联规则的挖掘
算法
2.4 本体学习算法
本体维护算法
本体的修剪(发现和删除无关的概念)
(1)基线修剪(2)相对修剪
本体的精练(对本体的精细调整和增量扩展)
主要思想是先找出未知的词条,然后从本体中 找出与其相似的概念并提交给用户,最后由用 户决定该未知词条的意义。
FCA-Merge(第 三步):从概念格 生成新本体
2.3 数据的导入和处理技术
合并 本体1中的Hotel 本体2中的Hotel 本 体 2中 的 Accommodation
合并 生成新概念或关系
合并
2.3 数据的导入和处理技术
FCA-Merge算法小结
输入:两个本体和一个自然语言文档集 输出:一个合并过的本体。 对输入数据有如下要求: 文档集应该和每个源本体都相关。 文档集应该包含源本体中的所有概念。 文档集应该能够很好的分离概念。
3.本体的评价
精度 学习生成的本体
手工生成的本体
precisionOL =
| CompRef | | Comp|
召回率
recallOL =
| CompRef | | Ref|
Hale Waihona Puke 其中,Ref是参照本体中元素的集合, Comp是比较本体中元素的集合。
知识图谱研究现状
知识图谱研究现状随着计算机科技的不断发展,知识图谱(Knowledge Graphs)已经成为当今信息技术研究的热点课题。
知识图谱是一种复杂的有向图,具有半结构化或非结构化的实体集合和关系集合。
它以高度建模的形式表达了实体和它们之间的关系,使得有记忆力的计算机可以以高效率地计算出更有价值的信息和洞察。
知识图谱的研究与学习涉及到许多先进的机器学习技术,包括人工神经网络(ANN)、深度学习(DL)和自然语言处理(NLP)等,以及传统的知识表示技术,包括本体(Ontology)、语义网(Semantic Web)和表示学习(Representation Learning)等。
在这些技术的帮助下,知识图谱可以更加准确、可靠地描述世界现实,从而降低了知识表示和计算的门槛,提高了算法精度,并增加了服务质量。
截至2018年,知识图谱的研究有了较大的发展。
人工智能社区,尤其是深度学习和自然语言处理,把闪光点放在了知识图谱上,并积极投入资源。
近年来,基于深度神经网络的知识图谱推理方法也取得了很大的进展,例如KG2E(Knowledge Graph Embedding)、KBA (Knowledge Base Alignment)、KBC(Knowledge Base Completion)和KGE(Knowledge Graph Expansion)等方法,大大拓宽了知识图谱应用的范围。
此外,随着Web语义技术,如OWL(Web Ontology Language),RDF(Resource Description Framework)和SPARQL(SPARQL Protocol and RDF Query Language)的兴起,知识图谱研究也取得了显著进展,如知识发现、知识表示和知识库建设等。
这些技术在结构化的知识图谱研究中发挥了重要作用,基于Web的知识图谱从结构化的知识仓库演变为了一个更广阔的知识网络。
Ontology研究的知识图谱演化_李国洪
4
信息检索
54
14
OWL
15
24
语义
6
5
Ontology
47
15
知识管理
13
25
比较研究
6
6
数字图书馆
42
16
语义 Web
12
26
知识库
6
7
知识组织
28
17
信息组织
11
27
图书馆
6
8
语义网
25
18
知识本体
11
28
可视化
6
9
叙词表
24
19
本体学习
11
29
电子政务
6
10
本体构建
23
20
元数据
9
30
知识表示
研究方向: 土地管理; 武龙龙( 1988 - ) ,男,硕士研究生,研究方向: 图书情报技术。
毅( 1989 - ) ,男,本科,
·102·
情报杂志
第 32 卷
上避免了节点标签的覆盖重复,并生成聚类条目文件。 有 512 篇文献进入计量分析。为了避免不规范关键词
两者各具特色、各有侧重,分析结果可以相互验证,提 对研究的不利影响,本文对关键词中的近义词、泛义词
Li Guohong1 Liang Baocheng2 Zhao Yi2 Wu Longlong2
( 1. Library,Sichuan University,Chengdu 610064; 2. College of Public Administration,Sichuan University,Chengdu 610064)
Ontology在语义Web中的应用研究
收稿日期:2003204212;修返日期:2003207203Ontology 在语义Web 中的应用研究邓 芳(北京邮电大学科学与技术学院,北京100876)摘 要:探讨了本体Ontology 及语义W eb ,描述了Ontology 在语义W eb 中的作用,结合信息检索和B2B 的电子商务这两个具体应用,研究了Ontology 在其中的作用,并且对实现中需要注意的问题进行了说明。
关键词:本体;语义W eb ;信息检索;B2B中图法分类号:TP30112 文献标识码:A 文章编号:100123695(2004)0620097202Research on the Application of Ontology in Semantic WebDE NG Fang(College o f Computer Science &Technology ,Beijing Univer sity o f Posts &Telecommunications ,Beijing 100876,China )Abstract :The techn ology of ontology and semantic web is surveyed .The research is made on the application of ontology in semanticweb.T w o applications ,in formation searching and B2B electronic business ,are given.And suggestions of realization are given in the end.K ey w ords :Ontology ;Semantic Web ;In formation Search ;B2B1 语义WebInternet 和Web 已成为人们获取和发布信息不可缺少的方式和工具,但其构成的庞大的信息网也给使用者带来了很多问题和苦恼。
the European Communities
WonderWeb Deliverable D16Reusing semi-structured terminologies for ontology building: A realistic case study in fishery information systemsAldo Gangemi ISTC-CNR email: a.gangemi@r.itIdentifier Class Version Date Status Distribution Lead PartnerD16 Deliverable 1.0 7-05-2004 public ISTC-CNR1IST Project 2001-33052 WonderWeb: Ontology Infrastructure for the Semantic WebiiIST Project 2001-33052 WonderWeb: Ontology Infrastructure for the Semantic WebWonderWeb ProjectThis document forms part of a research project funded by the IST Programme of the Commission of the European Communities as project number IST-2001-33052. For further information about WonderWeb, please contact the project co-ordinator:Ian Horrocks The Victoria University of Manchester Department of Computer Science Kilburn Building Oxford Road Manchester M13 9PL Tel: +44 161 275 6154 Fax: +44 161 275 6236 Email: wonderweb-info@iiiIST Project 2001-33052 WonderWeb: Ontology Infrastructure for the Semantic WebRevision InformationRevision Version date 5-05-2004 V1.0 ChangesivIST Project 2001-33052 WonderWeb: Ontology Infrastructure for the Semantic WebTable of Contents11.1 1.2INTRODUCTION................................................................... 6Bootstrapping dedicated semantic webs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 A bit of history. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722.1 2.2 2.3THE FISHERY CASE STUDY: RESOURCES, ISSUES, AND METHODS ........................................................................... 8Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Some issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Some methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 133.1 3.2 3.3 3.4KOS REENGINEERING LIFECYCLE .....................................12Formatting and lifting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Formalization, and Core ontology building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 4 Modularization, and alignment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 7 Annotation, refinement, merging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 044.1 4.2 4.3POST-PROCESSING LIFECYCLE ........................................24Services for information retrieval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 5 Services for distributed database querying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 6 Tools .................................................................................... 2 75 6FURTHER DISCUSSION ON THE CASE STUDY AND ITS RELEVANCE TO THE SEMANTIC WEB ................................29 REFERENCES ....................................................................31vIST Project 2001-33052 WonderWeb: Ontology Infrastructure for the Semantic Web1 Introduction1.1 Bootstrapping dedicated semantic websA main issue in the deployment of the Semantic Web (SW) is currently its population: very few ontologies and tagged documents exist in comparison to the huge amount of domains and documents that exist on the Web. Several strategies are being exploited to bootstrap the SW: machine learning [1,2,3], NLP techniques [4,5], semantic services [6], lifting existing metadata [7,8,9,10,11,12], etc. These strategies have different advantages according to the type of documents or domains: while machine learning and NLP techniques try to extract useful recurrent patterns out of existing (mostly free text or semi-structured) documents, and semantic services try to generate semantically indexed, structured documents e.g. out of transactions, existing metadata can be considered proto-ontologies that can be "lifted" from legacy indexing tools and indexed documents. In other words, metadata lifting ultimately tries to reengineer existing document management systems into dedicated semantic webs.1 Legacy information systems often use metadata contained in Knowledge Organization Systems (KOSes), such as vocabularies, taxonomies and directories, in order to manage and organize information. KOSes support document tagging (thesaurus-based indexing) and information retrieval (thesaurus-based search), but their semantic informality and heterogeneity usually prevent a satisfactory integration of the supported documentary repositories and databases. As a matter of fact, traditional techniques mainly consist of time-consuming, manual mappings that are made – each time a new source or a modification enter the lifecycle – by experts with idiosyncratic procedures. Informality and heterogeneity make them particularly hostile with reference to the SW. This document describes the methodology used for the creation, integration and utilization of ontologies for information integration and semantic interoperability, with respect to a case study: fishery information systems. Such a case study, which is definitely not a toy example, has been the target of an institutional project carried out by CNR and UN-FAO, which exploited the DOLCE ontology and the methods developed within the WonderWeb project, as well as previous methodologies developed in the past by ITBM-CNR2 We describe various methods to reengineer, align, and merge KOSes in order to build a large fishery ontology library. Some examples of semantic services based on it, either for a simple one-access portal or a sophisticated web application are also sketched, which envisage a fishery semantic web. With respect to the main threads of WonderWeb (languages, tools, foundational ontologies, versioning, and modularity), we concentrate this section on a demonstration of KOS reengineering issues from the viewpoint of formal ontology, therefore the main threads will appear in the context of the case study description rather than as explicitly addressed topics. We assume a basic knowledge of the deliverable D18 for full comprehension of this section.1Notice that the different strategies are not mutually exclusive, but can be combined. In the FOS project, we have also used techniques from NLP and semantic services. 2 The former ontology group of ITBM-CNR has now joined ISTC-CNR6IST Project 2001-33052 WonderWeb: Ontology Infrastructure for the Semantic WebWe thank the UN-FAO WAICENT-GILW department for allowing us to reuse in this deliverable some of the FOS project documentation.1.2 A bit of historyIn the beginning of 2002 the Food and Agriculture Organization of the United Nations (FAO, in the following)1 , based in Rome, took action in order to enhance the quality of its information and knowledge services related to fishery. The following internal agencies were asked to participate into a task-force by providing manpower and/or data, information or knowledge repositories: FAO Fishery Department2 provided the reference tables of its Internet portal, the Fishery Global Information System (FIGIS). ASFA secretariat3 , the managing body of the Aquatic Sciences and Fisheries contributed its online thesaurus for fishery. SIFAR, the Support unit for International Fisheries and Aquatic Research4 , contributed the contents and the structure of the oneFish community directory. FAO WAICENT, the World Agricultural Information Centre5 , provided access, through its office for General Information Systems and Digital Libraries (GILW), to the fishery part of the AGROVOC Thesaurus. FOS naturally fitted the wider AOS (Agriculture Ontology Service) long-term programme6 , started by FAO at the end of 2001, of which FOS constitutes one major case study (together with the Food Safety project [12], and others). The scientific coordination and supervision of the FOS project was assigned to the Laboratory for Applied Ontology of the Institute of Cognitive Sciences and Technology of the Italian National Research Council (LOA, in the following)7 . The outline of the project and the preliminary methods have already been presented in [13]. Here we describe some salient aspects of the FOS project after the completion of the first phase (2002-2003), which show the principles (and their applicability) that can be adopted when reengineering semi-structured KOSes into formal ontologies, in formats and with the tools envisaged by the WonderWeb project. Section 2 describes the sources that were subject to reengineering, integration, alignment, and merging, and the general issues and principles. Section 3 presents the methodology with more detail, an outline of the global results, and provides some examples of the interoperability between the sources, which was achieved. Finally, section 4 draws some conclusions.1 2 /figis/servlet/FiRefServlet?ds=staticXML&xml=webapps/figis/wwwroot/fi/figis/index.x ml&xsl=webapps/figis/staticXML/format/webpage.xsl 3 /fi/asfa/asfa.asp 4 /global/about.htm 5 /WAICENT/ 6 /agris/aos 7 http://www.loa-cnr.it7IST Project 2001-33052 WonderWeb: Ontology Infrastructure for the Semantic Web2 The fishery case study: resources, issues, and meth ods2.1 ResourcesThe following resources have been singled out from the fishery information systems considered: OneFish topic trees OneFish [14] is a portal for fishery activities and a participatory resource gateway for the fisheries and aquatic research and development sector. It contains heterogeneous data, organized through hierarchical topic trees (more than 1,800 topics, increasing regularly), made up of hierarchical topics with brief summaries, identity codes and attached knowledge objects (documents, web sites, various metadata). The hierarchy (average depth: 3) is ordered by (at least) two different relations: subtopic, and intersection between topics, the last being notated with @ , similarly to relations found in known subject directories like DMOZ. There is one 'backbone' tree consisting of five disjoint categories, called worldviews (subjects, ecosystem, geography, species, administration) and one worldview (stakeholder), maintained by the users of the community, containing own topics and topics that are also contained in the first four other categories (Figure 5). Alternative trees contain new 'conjunct' topics deriving from the intersection of topics belonging to different categories. AGROVOC thesaurus AGROVOC [15] has been developed by FAO and the Commission of the European Communities in the early 1980s and is used for document indexing and retrieval. It is a multilingual structured and controlled vocabulary designed to cover the terminology of all subject fields of agriculture, forestry, fisheries, food and related domains (e.g. environment) in order to describe the documents in a controlled language system. Different hierarchical and associative relations (broader/narrower terms, related terms, equivalent terms, used for) are established between the terms. AGROVOC contains approximately 2,000 fishery related descriptors out of about 16,000 descriptors. ASFA thesaurus ASFA [16] is an abstracting and indexing service covering the world's literature on the science, technology, management, and conservation of marine, brackishwater, and freshwater resources and environments, including their socio-economic and legal aspects. The thesaurus is an online service, which provides terminological definitions in terms of various relations, e.g. narrower term, related term, used for. It consists of more than 6,000 descriptors. FIGIS reference tables FIGIS [17] is a global network of integrated fisheries information. Presently its thematic sections are four: aquatic species (i.e. biological information); geographic objects (water and continental areas, political geographic entities); marine resources (information on the state of world resources, data on regional fish stocks, major issues affecting stocks); marine fisheries (data and maps on the exploitation of the major species, management-related information) fishing technologies (information on high seas vessels identification, on the selection of technologies and on training and on international legal issues). The FIGIS reference tables comprise all the contents of this huge database. The reference tables consists of approximately 200 top-level concepts, with a max depth of 4, 30,000 'objects'8IST Project 2001-33052 WonderWeb: Ontology Infrastructure for the Semantic Web(mixed concepts and individuals), relations (specialized for each top category, but scarcely instantiated) and multilingual support. FIGIS DTDs Some XML Document Types Definitions (now moving to RDFS) are also maintained by FIGIS to organize their databases. The original set included 823 elements with a rich attribute structure. Those related to fishery ontologies have been taken into account.2.2 Some issuesAs mentioned in the introduction the sources to be integrated were rather variate under many perspectives(semantic, lexical and structural).9IST Project 2001-33052 WonderWeb: Ontology Infrastructure for the Semantic WebAQUACULTURE (AGROVOC) NT1 fish culture NT2 fish feeding NT1 frog culture … rt agripisciculture rt aquaculture equipment … Fr aquaculture Es acuicultura AQUACULTURE (ASFA) NT Brackishwater aquaculture NT Freshwater aquaculture NT Marine aquaculture rt Aquaculture development rt Aquaculture economics rt Aquaculture engineering rt Aquaculture facilitiesBiological entity (FIGIS) Taxonomic entity Major group Order Family Genus Species Capture species (filter) Aquaculture species (filter) Production species (filter) Tuna atlas spec SUBJECT (OneFish) Aquaculture Aquaculture development Aquaculture economics @ Aquaculture planningTable 1. Sample aquaculture descriptors in the four resources. NT means narrower than; rt means related term, Fr and Es are the corresponding French and Spanish terms)An example of how formal ontologies can be relevant for fishery information services is shown by the information that someone could get if interested in aquaculture (Tab. 1). In fact, beyond simple keyword-based searching, searches based on tagged content or sophisticated natural-language techniques require some conceptual structuring of the linguistic content of texts. The four systems concerned by this case study provide this structure in very different ways and with different conceptual “textures”. For example (Tab. 1), the AGROVOC and ASFA thesauri put aquaculture in the context of different thesaurus hierarchies. The AGROVOC thesaurus seems to conceptualize aquaculture types from the viewpoint of techniques and species. ASFA aquaculture hierarchy is substantially different, since the hierarchy seems to stress the environment and disciplines related to aquaculture. A different resource is constituted by the so-called reference tables in FIGIS system; the only reference table mentioning aquaculture puts it into another context (taxonomical species). The last resource examined is oneFish directory, which returns a context related to economics and planning. With such different interpretations of aquaculture, we can reasonably expect different search and indexing results. Nevertheless, our approach to information integration and ontology building is not that of creating a homogeneous system in the sense of a reduced freedom of interpretation, but in the sense of navigating alternative interpretations, querying alternative systems, and conceiving alternative contexts of use. Once made clear that different fishery information systems provide different views on the domain, we directly enter the paradigm of ontology integration, namely the integration of schemas that are arbitrary logical theories, and hence can have multiple models (as opposed to database schemas that have only one model) [19]. As a matter of fact, thesauri, topic trees and reference tables used in the systems to be integrated could be considered as informal schemata conceived in order to query semistructured or informal databases such as texts, forms and tagged documents. In order to benefit from the ontology integration framework, we must transform informal schemata into formal ones. In other words, thesauri and other terminology management resources must be transformed into formal ontologies. In order to do this, we require a comprehensive set of ontologies that are designed in a way that admits the existence of many possible pathways among concepts under a common conceptual framework. In our opinion, the framework should: — reuse domain-independent ontologies shared by the resources, in order to make the different components interoperate 10—be flexible enough, so that different views have a common context—be focused on the core reasoning schemata for the fishery domain, otherwise the common conceptual framework would be too abstract.Domain-independent, foundational ontologies [18] characterise the general notions needed to talk about economics, biological species, fish production techniques; for example: parts, agents, attribute, aggregates, activities, plans, devices, species, regions of space or time, etc.Furthermore, so-called core ontologies [18] characterise the main conceptual schemata that the members of the fishery community use to reason, e.g. that certain plans govern certain procedures involving certain devices applied to activities like capturing fish of a certain species in certain areas of water regions, etc.Foundational and core ontologies provide the framework to integrate in a meaningful and intersubjective way different views on the same domain, such as those represented by the queries that can be done to a set of distributed information systems containing (un)structured data.2.3 Some methodsIn order to perform this reengineering task, we have applied the techniques of three methodologies: ap-plication of DOLCE foundational principles introduced in WonderWeb D18 [18], ONIONS [20], and OnTopic [21].WonderWeb D18 contains principles for building and using foundational ontologies for core and domain ontology analysis, revision, and development. DOLCE is an axiomatic, domain-independent theory based on formal principles.ONIONS is a set of methods for reengineering (in)formal domain metadata, such as glossaries, terminologies, data models, conceptual schemata, business models, etc. to the status of formal ontology data types, for integrating them in a common formal structure, for aligning them to a foundational ontology, and for merging them. Some methods are aimed at reusing the structure of hierarchies (e.g., BT/NT relations, subtopic relation, etc.), the additional relations that can be found (e.g., RT relations), and at analysing the compositional structure of terms in order to capture new relations and definitional elements. Other methods concern the management of semantic mismatches between alternative or overlapping ontologies, and the exploitation of systematic polysemy t o discover relevant domain conceptual structures.OnTopic is about creating dependencies between topic hierarchies and ontologies. It contains methods for deriving the elements of an ontology that describe a given topic, and methods to build “active” topics that are defined according to the dependencies of any individual, concept, or relation in an ontology. OnTopic has only suggested design decisions in the case study.In section 3, we describe these methods as used in the KOS reengineering lifecycle, the types of data extracted from the fishery resources, with examples of their porting, translation, transformation, and refinement.In section 4 we finally give a resume of the tools tested and/or endorsed in the case study.3 KOS REENGINEERING LIFECYCLEIn Fig.s 1,3,6,8,9, a UML "activity diagram" is shown that summarizes the main steps of the methods we have followed to create the Fishery Ontology Library (FishOL). For the sake of readability, we have split the activity diagram into five pieces as follows:1)Terminological database (TDB) formatting and schema lifting2)TDB porting, formalization, and Core ontology building3)Modularization, ontology library building, and alignment to reference ontologies4)Annotation, refinement, and merging of the library5)Measures for finalisation, maintenance, and exploitation3.1Formatting and liftingIn the first phase of the lifecycle (Fig. 1), the original terminological databases are im-ported into a common database format. The conceptual schemata of the databases are lifted (either manually, or by using automatic reverse engineering components [11]). At the same time, a common Ontology Data Model (ODM) should be chosen. This can be partly derived from the semantics of ontology representation languages (e.g. the OWL ODM [22]), enhanced with criteria for distinguishing the different data types at the onto-logical level (e.g. individual, class, meta-property, relation, property name, lexicon, etc.). Ontologically explicit ODMs are described in [23,24].With the help of the ODM, lifted schemata can be translated and then integrated (the integration methodology assumed is [19]).In FOS, the original TDBs resulted to be syntactically heterogeneous, specially FIGIS with respect to ASFA and AGROVOC. In fact, the first is controlled through a set of XML DTDs (currently moving to RDFS), while the seconds are implemented in relational databases with one basic relational table.Semantically, TDB schemata are even more heterogeneous (see Table 1 for examples). ASFA is a typical thesaurus, made up of descriptors (equivalence classes of terms with the same assumed meaning), equivalent terms, and relations among descriptors (BT, NT, RT, UF) that create a forest structure (an indirect acyclic graph, [25]). Descriptors are encoded via a "preferred" term. AGROVOC is also a thesaurus, but contains multilingual equivalent terms, and descriptors are encoded via alphanumeric codes. FIGIS is not a thesaurus, but a collection of TDBs organised into modules containing different domain terminologies, e.g. vessels, organisms, techniques, institutions, etc. Equivalence classes of multilingual terms are defined (similar to thesaurus descriptors). Each equivalence class has an identification code. Each module has a peculiar schema including local relations defined on (classes of) classes of terminological equivalence classes. E.g., a relation between institutions and countries, a relation between vessels and techniques, between organism species and genera, etc.Figure 1. A UML activity diagram for formatting and lifting activities.These relations are more informative than generic RT thesaurus relations (see phase 2 about additional transforms to TDB).FIGIS DTDs encode heterogeneous metada for the management of the FIGIS database. These XML elements can refer to domain-specific information (e.g. "Location"), datatypes (e.g. "Date"), data about data (e.g. "Available"), foreign keys (e.g. "AqSpecies_Text").AdministrationSubjects EcosystemStakeholdersGeography SpeciesFig. 2. Topic spaces ("worldviews") in oneFish.Finally, OneFish is a tree structure of subjects (keywords used to classify documents), with multihierarchical links, similar to Web directories like DMOZ [26]. The top subjects in OneFish are depicted in Fig. 2.The integrated schema results to include all the data types from the TDBs. On the other hand, we needed to interpret the original data types into an (onto)logically valid integrated schema. Therefore, we have created a mapping from each (domain-related) legacy data type to an ODM data type, e.g. owl:Class, to which "descriptor" and "FIGIS equivalence class" have been mapped, owl:ObjectProperty, to which "RT" and most FIGIS relations have been mapped (as instances), topic, to which OneFish "subject" has been mapped, etc.As explained below, some adjustments are needed to the original TDBs in order to preserve a correct semantics when translating some elements to the integrated schema.3.2Formalization, and Core ontology buildingAfter a common format and an integrated ontology data model have been obtained, the second phase (Fig. 3) starts by choosing an Ontology Representation Language (ORL).E.g., in FOS some tests have been performed at the beginning of the project, and we have decided to take a layered approach, maintaining the TDBs into different ontology re-positories represented into languages of increasing expressivity. RDF(S) [27] has been chosen for the basic layer, DAML+OIL [28] (currently OWL-DL [22]) for the middle layer, and KIF [29] for the expressive layer.The reason for such layering resides in a) the necessity of carrying out certain ontology learning procedures (see phase 4) with the expressive version, b) the necessity of using the standard Semantic Web ontology languages to carry out inferences with the middle layer, and c) the necessity of maintaining a lightweight ontology with the basic layer. RDF(S) can also be used to import the original TDBs without using the ODM. In fact, a preliminary decision was required when deciding how the ontologies that were obtained from the TDBs should get used.1The first choice has been to preserve the TDB elements into the original data models. In this case, no mapping has been performed from original data models to the ODM, and only an integrated (non-refined) data model has been used. The advantage of this choice is that no interpretation is performed on the legacy TDBs, but there are two disadvantages: translated TDBs are not (proto-)ontologies, but RDF models, hence no ontology inferencing can be made using them; imported TDBs cannot be aligned or merged, but only integrated.The second choice has been to translate the TDBs according to the ODM, then interpreting and mapping the original data models, and making needed refinements in order to preserve the semantics of ODM. This solution overcomes the disadvantages of the first choice at the cost of making interpretations. E.g. in FOS the maintainers of the legacy TDBs are members of the task-force, then we can expect that interpretations are not harmful. In other contexts – specially if experts are not collaborating –interpretations may be more problematic.Figure 3. The activity diagram for metadata formalization and Core ontology building.In the case study, the first choice has been easily produced through a rather economic procedure. Most efforts have then been put into translating and sometimes transforming the TDBs into proto- and then full-fledged ontologies. In particular, a translation t o1 A similar problem is discussed in the W3C SW Best Practices and Deployment Working Group wrt to wordnets and thesauri [30].ODM data types has been performed.For certain terminological data types, a refinement is performed at this stage and after alignment (see phase 3).For example, AGROVOC makes no difference between descriptors denoting owl:classes (e.g. agrovoc:River), and descriptors denoting owl:individuals (e.g. agrovoc:Amazon). Most individuals have been found in subdomains like geography and institutions. Another example concerns thesauri relations: while RT (Related Term) needs no refinement wrt ODM: it is imported as a subproperty of owl:ObjectProperty holding between individuals (and defined on classes), and UF is an owl:DatatypeProperty holding between lexical items (strings), on the contrary BT (Broader Term) is usually the rdfs:subClassOf property, but sometimes it is used as a "part of" owl:ObjectProperty. Translation and refinement have been complemented by transforming the applications of RT and of owl:ObjectProperties from FIGIS into formal owl:Restrictions.The working hypotheses in making these transformations are that:—the resulting owl:Restrictions are inheritable to all the subclasses of the rdfs:class to which the restriction pertain, and—the quantification applicable to restrictions is owl:someValuesFromBoth hypotheses result confirmed in most FOS cases, e.g. in AGROVOC, from the original record:<Fishing vessel> <RT> <Fishing gear>it is semantically correct to derive the following transform (we use OWL abstract syntax [31] for most examples in this section of the deliverable):Class(agrovoc:Fishing_vessel partial(restriction(agrovoc:RT someValuesFrom(agrovoc:Fishing_gear))))Figure 4. The DOLCE+ top level.In phase 4 we explain that RT restrictions can be refined in order to make their intended meaning more precise.A concurrent task has been performed during the translation and tranformation phase, which provides the means to fulfil the tasks in phase 3. Such task is about the construction of a Core Ontology, in this case study a Core Ontology of Fishery (COF). For the many theoretical underpinnings in core ontology construction that come from modularization and reuse wrt foundational ontologies, we refer to [18]. As an example, we only provide here a basic description of COF and of the reusable reference ontologies that have been employed.COF has been designed by specializing the DOLCE-Lite-Plus ("DOLCE+" in the following, Fig. 4 shows the most general classes) ontology[18], developed within。
Reviewing the design of DAML+OIL An ontology language for the semantic web
Reviewing the Design of DAML+OIL: An Ontology Language for the Semantic WebIan Horrocks University of Manchester Manchester,UK horrocks@ Peter F.Patel-SchneiderBell Labs ResearchMurray Hill,NJ,U.S.A.pfps@Frank van HarmelenVrije UniversiteitAmsterdam,the NetherlandsFrank.van.Harmelen@cs.vu.nlAbstractIn the current“Syntactic Web”,uninterpreted syntactic con-structs are given meaning only by private off-line agreementsthat are inaccessible to computers.In the Semantic Web vi-sion,this is replaced by a web where both data and its se-mantic definition are accessible and manipulable by computersoftware.DAML+OIL is an ontology language specificallydesigned for this use in the Web;it exploits existing Webstandards(XML and RDF),adding the familiar ontologicalprimitives of object oriented and frame based systems,andthe formal rigor of a very expressive description logic.Thedefinition of DAML+OIL is now over a year old,and the lan-guage has been in fairly widespread use.In this paper,wereview DAML+OIL’s relation with its key ingredients(XML,RDF,OIL,DAML-ONT,Description Logics),we discuss thedesign decisions and trade-offs that were the basis for thelanguage definition,and identify a number of implementa-tion challenges posed by the current language.These issuesare important for designers of other representation languagesfor the Semantic Web,be they competitors or successors ofDAML+OIL,such as the language currently under definitionby W3C.IntroductionIn the short span of its existence,the World Wide Web hasresulted in a revolution in the way information is transferredbetween computer applications.It is no longer necessary forhumans to set up channels for inter-application informationtransfer;this is handled by TCP/IP and related protocols.Itis also no longer necessary for humans to define the syntaxand build parsers used for each kind of information transfer;this is handled by HTML,XML and related standards.How-ever,it is still not possible for applications to interoperatewith other applications without some pre-existing,human-created,and outside-of-the-web agreements as to the mean-ing of the information being transferred.The next generation of the Web aims to alleviate thisproblem—making Web resources more readily accessible toautomated processes by adding information that describesWeb content in a machine-accessible and manipulable fash-ion.This coincides with the vision that Tim Berners-Leecalls the Semantic Web in his recent book“Weaving theWeb”(Berners-Lee1999).1/XML/Schema/2/RDF/they are to be used effectively by automated processes,e.g., to determine the semantic relationships between syntacti-cally different terms.DAML+OIL is the result of merging DAML-ONT(an early result of the DARPA Agent Markup Language (DAML)programme3)and OIL(the Ontology Inference Layer)(Fensel et al.2001),developed by a group of(largely European)researchers,several of whom were members of the European-funded On-To-Knowledge consortium.4 Until recently,the development of DAML+OIL has been undertaken by a committee largely made up of members of the two language design teams(and rather grandly titled the Joint EU/US Committee on Agent Markup Languages).5 More recently,DAML+OIL has been submitted to W3C as a proposal for the basis of the W3C Web Ontology language.6 As it is an ontology language,DAML+OIL is designed to describe the structure of a domain.DAML+OIL takes an object oriented approach,with the structure of the domain being described in terms of classes and properties.An on-tology consists of a set of axioms that assert characteristics of these classes and properties.Asserting that resources are instances of DAML+OIL classes or that resources are re-lated by properties is left to RDF,a task for which it is well suited.Since the definition of DAML+OIL is available else-where,7we will not repeat it here.Instead,in the follow-ing sections,we will review a number of fundamental design choices that were made for DAML+OIL:foundations in De-scription Logic,XML datatypes,layering on top of RDFS, comparison with its predecessor OIL,and the role of infer-ence for a Semantic Web ontology language.Foundations in Description LogicDAML+OIL is,in essence,equivalent to a very expressive Description Logic(DL),with a DAML+OIL ontology cor-responding to a DL terminology.As in a DL,DAML+OIL classes can be names(URI’s in the case of DAML+OIL)or expressions,and a variety of constructors are provided for building class expressions.The expressive power of the lan-guage is determined by the class(and property)constructors provided,and by the kinds of axioms allowed.Figure1summarises the constructors in DAML+OIL. The standard DL syntax is used in this paper for compact-ness as the RDF syntax is rather verbose.In the RDF syntax, for example,Human Male would be written as <daml:Class><daml:intersectionOfrdf:parseType="daml:collection"> <daml:Class rdf:about="#Human"/><daml:Class rdf:about="#Male"/></daml:intersectionOf></daml:Class>DL SyntaxintersectionOf Human Male unionOf Doctor Lawyer complementOf MaleoneOf john marytoClass hasChild Doctor hasClass hasChild Lawyer hasValue citizenOf USA minCardinalityQ hasChild Lawyer maxCardinalityQ hasChild Male cardinalityQ hasParent Female Figure1:DAML+OIL class constructorsThe meanings of thefirst three constructors from Figure1 are just the standard boolean operators on classes.The oneOf constructor allows classes to be defined by enumerat-ing their members.The toClass and hasClass constructors correspond to slot constraints in a frame-based language.The class is the class all of whose instances are related via the property only to resources of type,while the class is theclass all of whose instances are related via the property to at least one resource of type.The hasValue constructor is just shorthand for a combination of hasClass and oneOf. The minCardinalityQ,maxCardinalityQ and cardinalityQ constructors(known in DLs as qualified number restrictions) are generalisations of the hasClass and hasValue construc-tors.The class(,)is the class all of whose instances are related via the property to at least(at most,exactly)different resources of type.The emphasis on different is because there is no unique name as-sumption with respect to resource names(URIs):it is possi-ble that many URIs could name the same resource.Note that arbitrarily complex nesting of constructors is possible.The formal semantics of the class constructors is given by DAML+OIL’s model-theoretic semantics8or can be derived from the specification of a suitably expressive DL (e.g.,see(Horrocks&Sattler2001)).Figure2summarises the axioms allowed in DAML+OIL. These axioms make it possible to assert subsumption or equivalence with respect to classes or properties,the dis-jointness of classes,the equivalence or non-equivalence of individuals(resources),and various properties of properties.A crucial feature of DAML+OIL is that subClassOf and sameClassAs axioms can be applied to arbitrary class ex-pressions.This provides greatly increased expressive power with respect to standard frame-based languages where such axioms are invariably restricted to the form where the left hand side is a class name,there is only one such axiom per name,and there are no cycles(the class on the right hand side of an axiom cannot refer,either directly or indirectly,to the class name on the left hand side).A consequence of this expressive power is that all of the class and individual axioms,as well as the uniquePropertyAxiom ExampleBush G Bush differentIndividualFrom john peterinverseOf hasChild hasParent transitiveProperty ancestor ancestor uniqueProperty hasMother unambiguousProperty isMotherOfFigure2:DAML+OIL axiomsand unambiguousProperty axioms,can be reduced to sub-ClassOf and sameClassAs axioms(as can be seen from theDL syntax).As we have seen,DAML+OIL also allows properties of properties to be asserted.It is possible to assert that a prop-erty is unique(i.e.,functional)and unambiguous(i.e.,its inverse is functional).It is also possible to use inverse prop-erties and to assert that a property is transitive.XML Datatypes in DAML+OILDAML+OIL supports the full range of datatypes in XML Schema:the so called primitive datatypes such as string,decimal orfloat,as well as more complex derived datatypes such as integer sub-ranges.This is facilitated by main-taining a clean separation between instances of“object”classes(defined using the ontology language)and instances of datatypes(defined using the XML Schema type system). In particular,the domain of interpretation of object classesis disjoint from the domain of interpretation of datatypes, so that an instance of an object class(e.g.,the individual “Italy”)can never have the same denotation as a value ofa datatype(e.g.,the integer5),and that the set of object properties(which map individuals to individuals)is disjoint from the set of datatype properties(which map individualsto datatype values).The disjointness of object and datatype domains was mo-tivated by both philosophical and pragmatic considerations: Datatypes are considered to be already sufficiently struc-tured by the built-in predicates,and it is,therefore,notappropriate to form new classes of datatype values using the ontology language(Hollunder&Baader1991). The simplicity and compactness of the ontology language are not compromised:even enumerating all the XML Schema datatypes would add greatly to its complexity, while adding a logical theory for each datatype,even if it were possible,would lead to a language of monumental proportions.The semantic integrity of the language is not compromised—defining theories for all the XML Schema datatypes would be difficult or impossible without extending the language in directions whosesemantics would be difficult to capture within the existing framework.The“implementability”of the language is not compromised—a hybrid reasoner can easily be im-plemented by combining a reasoner for the“object”language with one capable of deciding satisfiability ques-tions with respect to conjunctions of(possibly negated) datatypes(Horrocks&Sattler2001).From a theoretical point of view,this design means that the ontology language can specify constraints on data val-ues,but as data values can never be instances of object classes they cannot apply additional constraints to elements of the object domain.This allows the type system to be ex-tended without having any impact on the ontology language, and vice versa.Similarly,the formal properties of hybrid reasoners are determined by those of the two components; in particular,the combined reasoner will be sound and com-plete if both components are sound and complete.From a practical point of view,DAML+OIL implementa-tions can choose to support some or all of the XML Schema datatypes.For supported datatypes,they can either imple-ment their own type checker/validater or rely on some exter-nal component.The job of a type checker/validater is simply to take zero or more data values and one or more datatypes, and determine if there exists any data value that is equal to every one of the specified data values and is an instance of every one of the specified data types.Extending RDF SchemaDAML+OIL is tightly integrated with RDFS:RDFS is used to express DAML+OIL’s machine readable specification,9 and RDFS provides the only serialisation for DAML+OIL. While the dependence on RDFS has some advantages in terms of the re-use of existing RDFS infrastructure and the portability of DAML+OIL ontologies,using RDFS to com-pletely define the structure of DAML+OIL is quite difficult as,unlike XML,RDFS is not designed for the precise spec-ification of syntactic structure.For example,there is no way in RDFS to state that a restriction(slot constraint)should consist of exactly one property(slot)and one class.The solution to this problem adopted by DAML+OIL is to define the semantics of the language in such a way that they give a meaning to any(parts of)ontologies that conform to the RDFS specification,including“strange”constructs such as restrictions with multiple properties and classes.The meaning given to strange constructs may,however,include strange“side effects”.For example,in the case of a restric-tion with multiple properties and classes,the semantics in-terpret this in the same way as a conjunction of all the con-straints that would result from taking the cross product of the specified properties and classes,but with the added(and probably unexpected)effect that all these restrictions must have the same interpretation(i.e.,are equivalent).DAML+OIL’s dependence on RDFS may also have con-sequences for the decidability of the language.Decidability is lost when cardinality constraints can be applied to proper-ties that are transitive,or that have transitive sub-properties. (Horrocks,Sattler,&Tobies1999).There is no way to for-mally capture this constraint in RDFS,so decidability in DAML+OIL depends on an informal prohibition of cardi-nality constraints on non-simple properties.DAML+OIL vs.OILFrom the point of view of language constructs,the differ-ences between OIL and DAML+OIL are relatively trivial. Although there is some difference in“keyword”vocabulary, there is usually a one to one mapping of constructors,and in the cases where the constructors are not completely equiva-lent,simple translations are possible.OIL also uses RDFS for its serialisation(although it also provides a separate XML-based syntax).Consequently, OIL’s RDFS based syntax would seem to be susceptible to the same difficulties as described above for DAML+OIL. However,in the case of OIL there does not seem to be an assumption that any ontology conforming to the RDFS meta-description should be a valid OIL ontology—presumably ontologies containing unexpected usages of the meta-properties would be rejected by OIL processors as the semantics do not specify how these could be translated into .Thus,OIL and DAML+OIL take rather differ-ent positions with regard to the layering of languages on the Semantic Web.Another effect of DAML+OIL’s tight integration with RDFS is that the frame structure of OIL’s syntax is much less evident:a DAML+OIL ontology is more DL-like in that it consists largely of a relatively unstructured collec-tion of subsumption and equality axioms.This can make it more difficult to use DAML+OIL with frame based tools such as Prot´e g´e(Grosso et al.1999)or OilEd(Bechhofer et al.2001)because the axioms may be susceptible to many different frame-like groupings.(Bechhofer,Goble,&Hor-rocks2001).The treatment of individuals in OIL is also very different from that in DAML+OIL.In thefirst place,DAML+OIL re-lies wholly on RDF for assertions involving the type(class) of an individual or a relationship between a pair of objects. In the second place,DAML+OIL treats individuals occur-ring in the ontology(in oneOf constructs or hasValue restrictions)as true individuals(i.e.,interpreted as single elements in the domain of discourse)and not as primitive concepts as is the case in OIL.This weak treatment of the oneOf construct is a well known technique for avoiding the reasoning problems that arise with existentially defined classes,and is also used,e.g.,in the C LASSIC knowledge representation system(Borgida&Patel-Schneider1994). Moreover,DAML+OIL makes no unique name assumption: it is possible to explicitly assert that two individuals are the same or different,or to leave their relationship unspecified. This treatment of individuals is very powerful,and justi-fies intuitive inferences that would not be valid for OIL,e.g., that persons all of whose countries of residence are Italy are kinds of person that have at most one country of residence: Person residence Italy residenceInference in DAML+OILAs we have seen,DAML+OIL is equivalent to a very ex-pressive DL.More precisely,DAML+OIL is equivalent to the DL(Horrocks,Sattler,&Tobies1999)with the addition of existentially defined classes(i.e.,the oneOf constructor)and datatypes(often called concrete domains in DLs(Baader&Hanschke1991)).This equivalence al-lows DAML+OIL to exploit the considerable existing body of description logic research to define the semantics of the language and to understand its formal properties,in par-ticular the decidability and complexity of key inference problems(Donini et al.1997);as a source of sound and complete algorithms and optimised implementation tech-niques for deciding key inference problems(Horrocks,Sat-tler,&Tobies1999;Horrocks&Sattler2001);and to use implemented DL systems in order to provide(partial) reasoning support(Horrocks1998a;Patel-Schneider1998; Haarslev&M¨o ller2001).A important consideration in the design of DAML+OIL was that key inference problems in the language,in partic-ular class consistency/subsumption,to which most other in-ference problems can be reduced,should be decidable,as this facilitates the provision of reasoning services.More-over,the correspondence with DLs facilitates the use of DL algorithms that are known to be amenable to optimised im-plementation and to behave well in realistic applications in spite of their high worst case complexity(Horrocks1998b; Haarslev&M¨o ller2001).Maintaining the decidability of the language requires cer-tain constraints on its expressive power that may not be ac-ceptable to all applications.However,the designers of the language decided that reasoning would be important if the full power of ontologies was to be realised,and that a pow-erful but still decidable ontology language would be a good starting point.Reasoning can be useful at many stages during the design, maintenance and deployment of ontologies.Reasoning can be used to support ontology design and to improve the quality of the resulting ontology.For example, class consistency and subsumption reasoning can be used to check for logically inconsistent classes and(possibly un-expected)implicit subsumption relationships(Bechhofer etal.2001).This kind of support has been shown to be par-ticularly important with large ontologies,which are often built and maintained over a long period by multiple authors.Other reasoning tasks,such as“matching”(Baader et al.1999)and/or computing least common subsumers(Baader &K¨u sters1998)could also be used to support“bottom up”ontology design,i.e.,the identification and description ofrelevant classes from sets of example instances.Like information integration(Calvanese et al.1998), ontology integration can also be supported by reason-ing.For example,integration can be performed usinginter-ontology assertions specifying relationships between classes and properties,with reasoning being used to com-pute the integrated hierarchy and to highlight any prob-lems/inconsistencies.Unlike some other integration tech-niques,this method has the advantage of being non-intrusivewith respect to the original ontologies.Reasoning with respect to deployed ontologies will en-hance the power of“intelligent agents”,allowing them todetermine if a set of facts is consistent w.r.t.an ontology,to identify individuals that are implicitly members of a givenclass etc.A suitable service ontology could,for example,allow an agent seeking secure services to identify a service requiring a userid and password as a possible candidate.ChallengesClass consistency/subsumption reasoning in DAML+OIL is known to be decidable(as it is contained in the C2fragmentoffirst order logic(Gr¨a del,Otto,&Rosen1997)),but manychallenges remain for implementors of“practical”reasoning systems,i.e.,systems that perform well with the kinds ofreasoning problem generated by realistic applications. Individuals Unfortunately,the combination of DAML+OIL individuals with inverse properties is sopowerful that it pushes the worst case complexity of theclass consistency problem from E XP T IME(for/OIL) to NE XP T IME.No“practical”decision procedure is cur-rently known for this logic,and there is no implementedsystem that can provide sound and complete reasoning for the whole DAML+OIL language.In the absence ofinverse properties,however,a tableaux algorithm has beendevised(Horrocks&Sattler2001),and in the absence of individuals(in extensionally defined classes),DAML+OILcan exploit implemented DL systems via a translation into (extended with datatypes)similar to the one used by OIL.It would,of course,also be possible to translateDAML+OIL ontologies into using OIL’s weaktreatment of individuals,but in this case reasoning with individuals would not be complete with respect to thesemantics of the language.This approach is taken by someexisting applications,e.g.,OilEd(Bechhofer et al.2001) Scalability Even without the oneOf constructor,class con-sistency reasoning is still a hard problem.Moreover,Web ontologies can be expected to grow very large,and with de-ployed ontologies it may also be desirable to reason w.r.t.a large numbers of class/property instances.There is good evidence of empirical tractability and scalability for implemented DL systems(Horrocks1998b; Haarslev&M¨o ller2001),but this is mostly w.r.t.logics that do not include inverse properties(e.g.,(Horrocks, Sattler,&Tobies1999)).Adding inverse properties makes practical implementations more problematical as several im-portant optimisation techniques become much less effec-tive.Work is required in order to develop more highly opti-mised implementations supporting inverse properties,and to demonstrate that they can scale as well as implemen-tations.It is also unclear if existing techniques will be able to cope with large numbers of class/property instances(Hor-rocks,Sattler,&Tobies2000).Finally,it is an inevitable consequence of the high worst case complexity that some problems will be intractable,even for highly optimised implementations.It is conjectured that such problems rarely arise in practice,but the evidence for this conjecture is drawn from a relatively small number of applications,and it remains to be seen if a much wider range of Web application domains will demonstrate similar char-acteristics.New Reasoning Tasks So far we have mainly discussed class consistency/subsumption reasoning,but this may not be the only reasoning problem that is of interest.Other tasks could include querying,explanation,matching,computing least common subsumers,etc.Querying in particular may be important in Semantic Web applications.Some work on query languages for DLs has already been done(Calvanese, De Giacomo,&Lenzerini1999;Horrocks&Tessaris2000), and work is underway on the design of a DAML+OIL query language,but the computational properties of such a lan-guage,either theoretical or empirical,have yet to be deter-mined.Explanation may also be an important problem,e.g.,to help an ontology designer to rectify problems identified by reasoning support,or to explain to a user why an applica-tion behaved in an unexpected manner.As discussed above, reasoning problems such as matching and computing least common subsumers could also be important in ontology de-sign.DiscussionThere are other concerns with respect to the place DAML+OIL has in the Semantic Web.After DAML+OIL was developed,the W3C RDF Core Working Group devised a model theory for RDF and RDFS10,which is incompati-ble with the semantics of DAML+OIL,an undesirable state of affairs.Also,in late2001W3C initiated the Web On-tology working group11,a group tasked with developing an ontology language for the Semantic Web.DAML+OIL has been submitted to this working group as a starting point for a W3C recommendation on ontology languages.A W3C ontology language needs tofit in with other W3C recommendations even more than an independent DAML+OIL would.Work is thus needed to develop a se-mantic web ontology language,which the Web Ontologyworking group has tentatively name OWL,that layers bet-ter on top of RDF and RDFS.Unfortunately,the obvious layering(that is,using the same syntax as RDF and extending its semantics,just as RDFS does)is not possible.Such an extension results in se-mantic paradoxes—variants of the Russell paradox.These paradoxes arise from the status of all classes(including DAML+OIL restrictions)as individuals,which requires that many restrictions be present in all models;from the sta-tus of the class membership relationship as a regular prop-erty(rdf:type);from the ability to make contradictory state-ments;and from the ability to create restrictions that refer to themselves.In an RDFS-compliant version of DAML+OIL, a restriction that states that its instances have no rdf:type re-lationships to itself is not only possible to state,but exists in all models,resulting in an ill-formed logical formalism. The obvious way around this problem,that of using non-RDF syntax for DAML+OIL restrictions,appears to be meeting with considerable resistance so either further edu-cation or some other solution is needed.ConclusionWe have discussed a number of fundamental design deci-sions underlying the design of DAML+OIL,in particular its foundation in Description Logic,its use of datatypes from XML Schema,its sometimes problematic layering on top of RDF Schema,and its deviations from its predecessor OIL. We have also described how various aspects of the language are motivated by the desire for tractable reasoning facilities. Although a number of challenges remain,DAML+OIL has considerable merits.In particular,the basic idea of having a formally-specified web language that can repre-sent ontology information will go a long way towards allow-ing computer programs to interoperate without pre-existing, outside-of-the-web agreements.If this language also has an effective reasoning mechanism,then computer programs can manipulate this interoperability information themselves, and determine whether a common meaning for the informa-tion that they pass back and forth is present.ReferencesBaader,F.,and Hanschke,P.1991.A schema for integrating concrete domains into concept languages.In Proc.of IJCAI-91, 452–457.Baader,F.,and K¨u sters,puting the least com-mon subsumer and the most specific concept in the presence of cyclic-concept descriptions.In Proc.of KI’98,129–140. Springer-Verlag.Baader,F.;K¨u sters,R.;Borgida,A.;and McGuinness,D.L. 1999.Matching in description logics.J.of Logic and Compu-tation9(3):411–447.Bechhofer,S.;Horrocks,I.;Goble,C.;and Stevens,R.2001. OilEd:a reason-able ontology editor for the semantic web.In Proc.of the Joint German/Austrian Conf.on Artificial Intelli-gence(KI2001),396–408.Springer-Verlag.Bechhofer,S.;Goble,C.;and Horrocks,I.2001.DAML+OIL is not enough.In Proc.of the First Semantic Web Working Sym-posium(SWWS’01),151–159.CEUR Electronic Workshop Pro-ceedings,/.Berners-Lee,T.1999.Weaving the Web.San Francisco:Harper. Borgida,A.,and Patel-Schneider,P.F.1994.A semantics and complete algorithm for subsumption in the CLASSIC description logic.J.of Artificial Intelligence Research1:277–308. Calvanese,D.;De Giacomo,G.;Lenzerini,M.;Nardi,D.;and Rosati,rmation integration:Conceptual modeling and reasoning support.In Proc.of CoopIS’98,280–291. Calvanese,D.;De Giacomo,G.;and Lenzerini,M.1999.An-swering queries using views in description logics.In Proc. of DL’99,9–13.CEUR Electronic Workshop Proceedings, /V ol-22/.Decker,S.;van Harmelen,F.;Broekstra,J.;Erdmann,M.;Fensel, D.;Horrocks,I.;Klein,M.;and Melnik,S.2000.The semantic web:The roles of XML and RDF.IEEE Internet Computing4(5). Donini,F.M.;Lenzerini,M.;Nardi,D.;and Nutt,W.1997.The complexity of concept rmation and Computation 134:1–58.Fensel,D.;van Harmelen,F.;Horrocks,I.;McGuinness,D.L.; and Patel-Schneider,P.F.2001.OIL:An ontology infrastructure for the semantic web.IEEE Intelligent Systems16(2):38–45. Gr¨a del,E.;Otto,M.;and Rosen,E.1997.Two-variable logic with counting is decidable.In Proc.of LICS-97,306–317.IEEE Computer Society Press.Grosso,W.E.;Eriksson,H.;Fergerson,R.W.;Gennari,J.H.; Tu,S.W.;and Musen,M.A.1999.Knowledge modelling at the millenium(the design and evolution of prot´e g´e-2000).In Proc.of Knowledge acqusition workshop(KAW-99).Haarslev,V.,and M¨o ller,R.2001.High performance reasoning with very large knowledge bases:A practical case study.In Proc. of IJCAI-01.Hollunder,B.,and Baader,F.1991.Qualifying number restric-tions in concept languages.In Proc.of KR-91,335–346. Horrocks,I.,and Sattler,U.2001.Ontology reasoning in the(D)description logic.In Proc.of IJCAI-01.Morgan Kaufmann.Horrocks,I.,and Tessaris,S.2000.A conjunctive query language for description logic Aboxes.In Proc.of AAAI2000,399–404. Horrocks,I.;Sattler,U.;and Tobies,S.1999.Practical reasoning for expressive description logics.In Ganzinger,H.;McAllester, D.;and V oronkov,A.,eds.,Proc.of LPAR’99,161–180.Springer-Verlag.Horrocks,I.;Sattler,U.;and Tobies,S.2000.Reasoning with individuals for the description logic.In Proc.of CADE-17,LNAI,482–496.Horrocks,I.1998a.The FaCT system.In de Swart,H.,ed.,Proc. of TABLEAUX-98,307–312.Springer-Verlag.Horrocks,ing an expressive description logic:FaCT orfiction?In Proc.of KR-98,636–647.McGuinness,D.L.1998.Ontological issues for knowledge-enhanced search.In Proc.of FOIS,Frontiers in Artificial Intelli-gence and Applications.IOS-press.McIlraith,S.;Son,T.;and Zeng,H.2001.Semantic web services. IEEE Intelligent Systems16(2):46–53.Patel-Schneider,P.F.1998.DLP system description.In Proc. of DL’98,87–89.CEUR Electronic Workshop Proceedings, /V ol-11/.。
Semantic Web语义网
Semantic WebSemantic Web语义网是的英文名称。
语义网就是能够根据语义进行判断的网络。
简单地说,语义网是一种能理解人类语言的智能网络,它不但能够理解人类的语言,而且还可以使人与电脑之间的交流变得像人与人之间交流一样轻松。
语义网是对未来网络的一个设想,在这样的网络中,信息都被赋予了明确的含义,机器能够自动地处理和集成网上可用的信息.语义网使用XML来定义定制的标签格式以及用RDF 的灵活性来表达数据,下一步需要的就是一种Ontology的网络语言(比如OWL)来描述网络文档中的术语的明确含义和它们之间的关系.添加了更多的用于描述属性和类型的词汇,例如类型之间的不相交性(disjointness),基数(cardinality),等价性,属性的更丰富的类型,属性特征(例如对称性,symmetry),以及枚举类型(enumerated classes).语义网的一些基本特征:(1)语义网不同于现在WWW,它是现有WWW的扩展与延伸;(2) 现有的WWW是面向文档而语义网则面向文档所表示的数据;(3) 语义网将更利于计算机“理解与处理”,并将具有一定的判断、推理能力。
虽然语义网给我们展示了WWW的美好前景以及由此而带来的互联网的革命,但语义网的实现仍面临着巨大的挑战:(1)内容的可获取性,即基于Ontology而构建的语义网网页目前还很少;(2)本体的开发和演化,包括用于所有领域的核心本体的开发、开发过程中的方法及技术支持、本体的演化及标注和版本控制问题;(3)内容的可扩展性,即有了语义网的内容以后,如何以可扩展的方式来管理它,包括如何组织、存储和查找等;(4)多语种支持;(5)本体语言的标准化。
语义网与万维网的区别语义网“不同于现存的万维网,其数据主要供人类使用,新一代WWW中将提供也能为计算机所处理的数据,这将使得大量的智能服务成为可能”;语义网研究活动的目标是“开发一系列计算机可理解和处理的表达语义信息的语言和技术,以支持网络环境下广泛有效的自动推理”。
当前主要本体推理工具的比较分析与研究
当前主要本体推理工具的比较分析与研究《现代图书情报技术》2006年第12期数字图书馆总第144期当前主要本体推理工具的比较分析与研究术徐德智汪智勇王斌(中南大学信息科学与工程学院长沙410083)【摘要】通过对当前一些主流本体推理机详细的分析研究,得出本体推理机的一般系统结构,在介绍三个典型的推理机系统(Pellet,Racer,FaCT++)后,从系统功能,用户和开发者三个不同角度设计并实现一套比较不同本体推理机的测试方案,实验证明测试方案是可行有效的,最后总结当前本体推理机存在的一些问题和未来发展趋势.【关键词】语义网本体推理机系统结构测试方案【分类号】TP391 Comparison,AnalysisandResearchonCurrentOntologyReasoners XuDezhiWangZhiyongWangBin (CollegeofInformationScienceandEngineering,CentralSouthUniversity,Changsha4100 83,China)【Abstract】ThroughanalyzingalotofcurrentOntologyreasonersindetail,thispaperconcludesageneral sys.tenstructureforOntologyreasoners.AfterintroducingthreetypicalOntologyreasoners(Pell et,Racer,FaCT++),it proposesandimplementsatestplanforOntologyreasonersfromthesystem,useranddevelop er'Spoint.Theexperi- mentresultsshowthatthetestplanisfeasibleandeffective.Atlast,thepaperanalyzesthedisad vantagesofcurrent OntologyreasonerandgivessomeideasonOntologyreasoners'futuredevelopment.【Keywords】SemanticWebOntologyreasonerSystemstructureTestplan 1引言Berners—Lee为未来的Web发展提出了基于语义的体系结构lj,本体(Ontology)位于语义Web体系结构中的第四层,它是解决语义层次上Web信息共享和交换的基础.由于本体推理机是本体创建和使用过程中必不可少的基础性支撑工具之一,因此国内外许多研究机构研发了一大批本体推理机,其中比较典型的有W3C用来对本体进行测试的本体推理机-2j,DIG推荐的基于描述逻辑实现的本体推理机-3j,一些集成在语义网开发平台(如HP实验室的Jena2,德国Karlsruhe大学的KAON2)和本体管理系统(如IBM的SNOBASE系统)中的推理机引擎.面对如此众多的本体推理机,有着不同需求的用户该如何选择适合自己的本体推理机,为了解决这个问题,就需要对当前的本体推理机系统进行详细的分析研究和对比测试,文献[4]提出了一种使用现实本体来测试对比推理机系统的方法,但是它仅仅是从系统功能这一个角收稿日期:2006—09一l1本文系湖南省自然科学基金项目"方面化构件模型及其组装体系结构评价研究"(项目编号:05JJ40312)的研究成果之一.l2?度来进行的.文献[5]重点对一些本体表示和查询语言进行了详细的测试对比,可并没有牵涉到具体的本体推理机系统之间的测试对比.为了使用户更好地了解,使用和开发本体推理机,本文从系统功能,用户和开发者三个不同的角度设计了一套比较不同本体推理机的测试方案,并对三个典型推理机系统进行了实验对比分析.下面先介绍本体推理机的一般系统结构.2本体推理机的系统结构在对收集到的本体推理机进行详细的分析研究后,归纳出了本体推理机的一般系统结构,见图1.图1本体推理机的系统结构图系统结构由本体解析器,查询解析器,推理引擎,结果输出模块和API五大模块组成.《现代图书情报技术》2006年第12期数字图书馆总第14J4期2.1本体解析器负责读取和解析本体文件,它决定了推理机系统能够支持的本体文件格式,如RDF,OWL,SWRL等.解析性能的好坏直接决定了推理机能否支持对大本体文件的解析.2.2查询解析器负责解析用户的查询命令,虽然SPARQL已经成为了RDF的候选标准查询语言,但目前还没有一种公认的针对OWL的标准查询语言,目前使用较多的有RDQL, nRQL,OWL—QL等.2.3推理引擎负责接受解析后的本体文件和查询命令,并执行推理流程,它是本体推理机的核心部件,因为它直接决定本体推理机系统的推理能力.目前大部分推理引擎是基于描述逻辑表算法实现的.2.4结果输出模块完成对推理引擎所推导出来的结果进行包装,以满足用户的不同需求.它决定了本体推理机能够支持的文件输出格式,一般常用的有XML,RDF,OWL等.2.5API模块主要面向开发用户,一般包含三大部分,OWL—API,DIG接口以及编程语言开发接口.OWL—API为用户操作OWL本体文件提供了一种标准接口,目前还没有一个公认的推荐标准,只有两种应用比较广泛的OWL—API (wonderWebOWL—API和Prot6g6OWL—API).DIG接口为描述逻辑推理机系统向外提供服务提供了一组标准的接口,作用类似于数据库中的ODBC,它允许前端(如本体编辑器)挂接到后台不同的推理引擎上,目前最新的版本为2.0.另外本体推理机提供的常见编程语言接口主要有Lisp和Java两种,因为大部分本体推理机系统是采用这两种编程语言实现的.3三个典型的本体推理机本文基于最新,应用最为广泛和最具有代表性三个原则挑选了三个典型的推理机系统Racer,Pellet,FaCT++进行介绍以及对比分析.3.1RacerRacerE(RenamedABoxandConceptExpressionRea- soner)是德国FranzInc.公司开发的一个采用描述逻辑作为理论基础的本体推理机,最新的版本为RacerPro1.9,它是一个功能强大的商用本体推理机,不仅可以当作描述逻辑系统使用,还可以用作语义知识库系统.它支持单机和客户端/服务器两种使用模式.3.2PelletPelletlJ是美国马里兰大学MINDSWAP项目组专门针对OWL—DL开发的一个本体推理机,基于描述逻辑表算法实现,最新的版本为Pellet一1.3,能够支持OWL—DL的所有特性,包括支持对枚举类和XML数据类型的推理,它是一个开源项目.3.3FaCT++FaCT++[8]是FaCT(FastClassificationofTemfinolo- gies)的新一代产品,FaCT是英国曼彻斯特大学开发的一个描述逻辑分类器,提供对模型逻辑(ModalLogic)的可满足性测试,采用了基于CORBA的客户服务器模式. FaCT++为了提高效率和获得更好的平台移植性,采用了c++而非FaCT的Lisp语言来实现,并开放源代码.4本体推理工具实验测试方案设计及实现对于本体推理机的测试方案目前为止还没有一个公认的标准,本体推理工具的对比分析主要集中在系统功能方面,简单地说就是看哪一个推理机系统功能最为强大,执行速度最快.由于测试者一般都是被测试系统的开发者,这样从测试数据的选取到测试方案的设计都很难保证全面客观,也不能为应用和开发用户提供全面的参考方案.本文在综合考虑了本体推理机基本功能和一般系统结构的基础上,结合传统描述逻辑推理机系统对比测试方案_9_9,提出了一套如图2的综合测试方案.图2测试方莱总体设计在本测试方案中,将从系统功能角度,用户角度以及开发者角度进行全面的对比分析,其中功能测试主要对本体推理机的两个基本推理功能-】0-(本体一致性检查和获取隐含知识能力)进行性能测试.应用用户角度分析主要从用户界面是否友好等角度来进行.开发用户角度分析则从本体推理机系统是否为开发者提供了丰富的程序开发接口等方面考察,最后综合考虑上述三个方面的对比测试结果,给出测试系统的最终评价.下面将详细介绍这套测试方案.4.1系统功能测试系统功能测试采用装载解析本体时间,检验本体一13?《现代图书情报技术》2006年第12期数字图书馆总第144期致性时间和推理查询时间三个指标对测试系统进行比较分析,主要用来比较本体推理机实现的两大基本推理功能,如果要对系统进行更为全面的性能测试,还应该加入更多的如概念分类,实例归类,本体包含检验等其他功能测试.(1)本体测试文件选择方案为了使本体测试数据不失一般性,在本测试方案选择测试本体数据时主要考虑了文件格式,本体类别,本体是否一致,文件大小,本体中定义的类,属性以及个体数量和数据来源等几个方面的因素.其中文件格式涵盖了XML,RDF,OWL,DAML和SWRL,本体类别涵盖了Lite,DL,Full三个级别,测试数据包含了一致和不一致两种类型的本体.具体的本体测试数据采用了来自4个不同地方的本体文件,一部分是由Lehigh大学SW AT项目…的数据产生器产生;一部分从Pellet系统测试文件中挑选;还有一部分是从ProtegeOntology Library本体库中选取;最后一部分是通过本体搜索引擎swoogle(/)从网上随机选取.这样就基本上就可以达到本体测试数据的一般性,广泛性和代表性.(2)查询语句设计方案为了使测试的查询语句具有代表性,在本测试方案中查询语句的设计主要考虑到了以下三个因素:①查询的类型即测试的查询语甸集是否涵盖了本体查询的所有类型,即是否涵盖了布尔查询,检索查询和组合查询三大类型_2J,其中布尔查询是最简单的一种查询,它一般可以转化为一个本体的一致性检查问题,而检索查询和组合查询则可能需要对本体进行推理.②查询结果的大小即查询结果中的类或者实例的数量在本体文件中所占的比例应该大于一个阈值,阈值一般是根据具体的本体测试文件而定.③查询牵涉到推理的复杂度即推理牵涉到类层次以及属性层次的深度和是否需要进行逆关系推理等复杂推理操作,一般推理牵涉到的类或属性层次越深,推理复杂程度就越高.下面给出三个典型的简单查询,查询语句采用OWL—QL语法格式来表达,测试查询语句的详细信息见表1.表1查询语句信息查询语句类别Ql?(Tomstudent)布尔Q2(type?xGraduateStudent)检索Q3(type?xStudent)合取(type?yDepartment)(memberOf?y?x)(nameOfComputerScience?y)其中Ql判断Tom是否是一名学生;Q2查询出所有研究生的名字;Q3查询所有信息科学学院的学生名字.有关查询时间的详细测14?试数据结果请见实验结果分析.4.2应用用户角度对比分析用户角度分析主要考察推理机系统是否为应用用户提供了一个友好的用户界面,是否提供了方便用户使用的文档和演示以及是否方便用户连接当前一些主流的本体编辑器等.表2给出了对比分析的相关详细结果.表2应用用户角度分析RacerPelletFaCT++用户界面GUI,复杂命令行无用户手册详细无无Demo无在线无支持格式owl,racer,swrlxml,rdf,OWll(b查询语言nRQL,owl—qlRDQL无Prot6g6Prot6g6Prot6g6支持编辑器0ilEd0ilEd0ilEd从上表可以看出,尽管Racer,Pellet和FaCT++对一些主流的本体编辑器能够提供良好的推理支持,但没有一个能为应用用户提供一个全面友好的使用环境,或者因为没有图形交互界面,或者交互界面太过于复杂. 并且Pellet和FaCT++支持的本体查询语言太少,另外FaCT++不支持当前的主流本体表示语言,如OWL等.4.3开发用户角度对比分析开发用户角度分析主要考察推理机系统是否为用户进行二次开发提供了详细的参考文档和编程接口,是否开放源代码和提供开发示例代码等.表3给出了Racer, Pellet和FaCT++的详细测试结果.表3开发用户角度分析RacerPelletFhCT++语言LispJavaLisp开源商用是是APIDIG,Lisp,JavaDIG,OWL—API,Jena,DIG开发文档详细一般无示例代码有有无从测试结果中可以看出,Racer与Pellet都为开发用户提供了丰富的二次开发接口,开发文档和一些示例代码,因此作为Lisp编程用户可以选择Racer进行二次开发,Java编程用户则可以考虑选择使用Pellet.5实验结果分析与展望实验平台:Pentium4CPU2.60GHz,256M内存,80G硬盘,WindowsXPSP2,JavaJDK1.5.参与测试的推理机:Racer1.9.0,Pellet1.3,FaCT++1.3.1.编辑器为Protege3.2Betao《现代图书情报技术》2006年第12期数字图书馆总第144期说明:由于FaCT++没有提供OWL接口也不支持对实例的查询(即A—Box查询),因此在装载本体和查询的对比测试中只对Racer与Pellet进行了比较.另外限于篇幅,本文没有列举出本体测试数据的相关详细信息.本体装载时间测试结果如图3所示:iU0UU苎UiUUZUU庀纽数"啦位×1OO个图3本体装载时间折线图从上图可以看出,本体装载时间随着本体文件中元组数目的增加而增加,当元组数量很少时,Racer和Pellet的装载时间差不多,但当元组数量超过万个以上时,Racer的上升速度要低于Pellet,这说明Racer能够对大本体文件提供一个良好的支持,这与Racer是一个商业系统的定位是一致的.本体一致性检查时间的详细测试结果如表4所示(时间单位为s).表4本体一致性检查时间对比O1O2O304大小5k13k46k151k不一致概念1144576Racer/DIG0.841.591.262.84Peliet/DIGO.781.63异常1.91FaCT++/DIGO.651.5O.61异常Pellet/单独O.O20.018O.O17异常从上表可以看出Racer能够很好地对所有测试本体进行一致性检查,Racer和Pellet所耗时间差不多,FaCT ++所耗时间最少.在测试中发现一致性检查时间主要由不一致概念数量和造成本体不一致的原因而不是文件大小决定,如本体02要比本体03小,但由于它包含更多不一致概念,因此所耗费的时间要比03长.另外不难发现采取DIG外挂方式要比单独使用本体推理机系统进行本体一致性检验耗费更多的时间,这是因为编辑器和推理机系统之间通信消耗了大量的时间.表5给出了查询时间测试结果(时间单位ms).表5查询时间对比RacerPelletQ1Q2Q3Q1Q2Q3时间113101453108436247L从测试结果来看,Racer与Pellet在Ql(布尔查询)上花费的时间差不多,这是因为执行布尔查询的过程就是一个检验本体一致性的过程,这个结果符合本体一致性时间测试结果.Racer执行Q2(检索查询)的时间要比Pellet耗得多,主要是因为Racer在对本体执行查询前需要对本体文件做一些索引等准备工作以加速其后查询的速度.但Pellet执行Q3(组合查询)的时间超过了Racer, 这是因为Q3查询牵涉到了对T—Box的推理,可以看出Racer在T—Box推理能力方面要优于Pellet.由此可见, 一般情况下如果只是进行A—Box推理查询,则考虑使用Pellet,而查询如果牵涉到T—Box推理则推荐使用Racer. 对于上述的各项测试结果,必须考虑到不同的软硬件平台,使用不同的本体测试文件和不同的查询语句均可以对实验数据产生很大的影响,这里只能够得出一个相对的比较结果,综合以上测试结果可以发现,Racer和Pellet都能够实现检查本体一致性和挖掘本体隐含知识两个基本功能,但综合起来评价Racer要优于Pellet.虽然FaCT++在本体一致性检查中占优,但由于它不支持OWL和本体查询语言,因此综合评价最低.这个结果与大多数评测结果是相符合的,这就说明本测试方案是可行和有效的.6结语及展望通过对当前本体推理机的分析和三个典型推理机的测试对比,发现尽管大部分本体推理机都能够实现两大基本推理功能,但是还是存在着一些不足,如Racer不支持对枚举类和用户自定义数据类型的推理,Pellet缺乏对本体规则语言SWRL的支持并且支持的本体查询语言不够全面等.另外它们还缺乏对多个本体,不一致本体以及大规模本体服务器的推理支持,还有用户界面不够友好等缺陷.由此可见,未来本体推理机的发展趋势应该是在系统功能方面开发更为强大和完善的推理算法,如在描述逻辑中支持量词约束,用户自定义数据类型和逆关系属性类型等.并能允许用户自定义推理规则以及支持多个,多版本和不一致本体的推理.向用户提供更加友好的GUI和更为丰富的程序开发接口也将是未来本体推理机的一大发展趋势.本文下一步的工作将是进一步完善本体推理机的测试方案,并实现一个测试平台.(下转第77页)3S2SlSO2lO装载时闯单位S堕堡垫术))2006年第12期工作交流总第l44期(上接第l5页)参考文献:1TimBemers—Lee.JamesHendler,OraLassila.TheSemanticWeb. ScientificAmerican.2001.284(5):34—432OWLTestResults(Semi—OfficialSemi—StaticView).http:// /2003/08/owl—systems/test—results—out#systems (AccessedSept.1,2006)3DESCRIPTIONLOGICREASONERS./一sattler/reasoners.html(AccessedSept.1,2006)4Z.Pan.BenehmarkingDLReasonersUsingRealisticOntologies.In Proc.oftheInternationalworkshoponOWL:ExperienceandDirec—tions(OWL—ED2005).Galway,Ireland.20055ZhijunZhang.OntologyQueryLanguagesfortheSemanticWeb:A PerformanceEvaluation.MastersThesis2oo5.5—346RacerSystemsGmbH&Co.KG.http://www.racer—systems.corn/ index.phtml(AccessedSept.1,2006)7PelletOWLReasoner./2003/pellet/in—dex.shtml(AccessedSept.1,2006)8OWL:FaCT++./faetplusplus/fAccessed Sept.1,2OO6)9AtilaKaya,Keno.Seizer.DesignandImplementationofaBenchmark TestingInfrastructurefortheDLSystemRacer.ProceedingsoftheKI2004InternationalWorkshoponApplicationsofDescriptionLogies (ADL04),Ulm,Germany,2004l0高琦陈华钧.互联网Ontology语言和推理的比较和分析.计算机应用与软件,2004.2l(10):73—76 llSWATProjects—theLehishUniversityBenchmark(LUBM).hap:// /projects/lubm/(AccessedSept.1,2006)(作者E—mail:******************)(上接第20页)性实施方案,体现了知识组织和实用分类体系在系统分析和设计中的功用.由于采用了DSpace开放源代码软件作为系统平台,未涉及实用分类体系向软件应用的转换,如利用分类体系进行数据建模,对象建模等,还有待进一步探索.参考文献:1杜文华.本体构建方法比较研究.情报杂志,2005(10):24—252肖敏.领域本体构建方法研究.情报杂志.2oo6(2):70—743Qin.Jian&Paling,Stephen.Convertingacontrolledvocabularyinto anontology:theeaseofGEM./ir/6—2/pa—per94.html(AccessedJun.29,2006)4黄伟,金远平.形式概念分析在本体构建中的应用.微机发展, 2005(2):28—315Prot6g6OWL网站./plugins/owl/(Ac—cessedOct.17.2006)6DSpaee网站.(AccessedApr.28.2006/ May.26,2OO6)7专门数字对象描述元数据规范..en/2003/Spc—Metadata/(AccessedJun.7,2004/May.26,2006)8黄晓斌.卢琰.论数字图书馆用户界面的评价.图书馆论坛.2005(6):l6一l9(作者E—mail:***************.cn) 77?。
IT课程目标Ontology及其语义Web应用
概念的属性 。
.
0 引 言
目前 , 构建有效的网络教学 资源整合 机制和满 足个性 化学 习需求的网络教学平 台已经成为 网络教育应用重要课 题。本文 研 究 了基 于 角 色 概 念 的 O t o 元 模 型 , 用 该 模 型 构 建 了 I no g ly 利 T 课程 目标 O to , no g 设计并实现 了基 于 O t o 的网络教学资源 l y n lg oy 整合机制和满足个性化需求的 网络教学平 台。
维普资讯 第2 5卷 Fra bibliotek 2期 20 0 8年 2 月
计 算机 应 用与软件
Co u e p iai n n ot r mp t rAp l to s a d S f c wa e
Vo . 5 No 2 12 .
Fb2o e .0 8
I T课 程 目标 Onoo y及 其 语 义 W e tlg b应 用
王晓东 王岁花 张 合 王红涛
( 河南师范大学计算机与信息技术学院 河南 新乡 4 30 ) 50 7
摘
要
基于 O t oy开发 网络学 习服务平台是网络教 育应用的 一个 重要课题 。基 于基本 概念 、 n lg o 角色概念 构建 O toy使表 达 nl og
ontology learning(本体)
1.3 本体的构建
手工:费时费力,容易出错 全自动:适用性不强 半自动:可行,其核心技术是本体的学 习——利用知识发现技术从数据源中获取 知识
2. 本体学习
2.1 2.2 2.3 2.4 本体学习周期 本体学习框架 数据的导入和处理技术 本体学习算法
2. 本体学习
2.1 2.2 2.3 2.4 本体学习周期 本体学习框架 数据的导入和处理技术 本体学习算法
1.研究背景
语义web采用多 层次的表示框架, 本体位于从文档描 述到知识推理转折 的层次,具有重要 的地位。本体的构 建是实现语义web 的关键环节。
1.1 本体
Ontology是共享概念模型的明确的形式化规 范说明。
“概念模型”:指Ontology是通过抽象出客观世界中 一些现象的相关概念而得到的模型。 “明确”:指Ontology所使用的概念及概念的约束都 有明确的定义。 “形式化”:指Ontology是计算机可读的(即能被计 算机处理)。 “共享”:指Ontology中体现的是共同认可的知识, 反映的是相关领域中公认的概念集(即Ontology针对 的是团体而非个体的共识)。
FCA-Merge(第一 步):生成两个形式 上下文。
形式上下文是一个三 元组K:=(G,M,I), 其中,G是一组对象的集 合;M是一组属性的集合; I是G和M间的二元关系, 即I G×M,(g,m) ∈I读作对象g具有属性m。
2.3 数据的导入和处理技术
2.3 数据的导入和处理技术
FCA-Merge(第 二步):合并上一 步生成的两个形式 上下文,生成一个 概念格。
1.1 本体
Ontology的结构是一个五元组
O:= { C, R, Hc,rel,AO}
Ontology在Semantic Web Services中的应用综述
Ontology在Semantic Web Services中的应用综述钱巧能;刘亚军【期刊名称】《计算机与数字工程》【年(卷),期】2005(33)11【摘要】介绍了Web Services的基本原理和Ontology的基本概念.基于这两种技术的结合,简单描述了多种基于本体的Web语言的发展,并重点阐述其中的OWL 语言及其在Web Services中应用OWL-S的主要内容.最后介绍本体在语义web 服务中的一些拓展性的应用.【总页数】4页(P126-129)【作者】钱巧能;刘亚军【作者单位】东南大学计算机科学与工程系,南京,210096;东南大学计算机科学与工程系,南京,210096【正文语种】中文【中图分类】TP393.092【相关文献】1.Knowledge Representation and Semantic Inference of Process Based on Ontology and Semantic Web Rule Language [J], Zhu Haihua;Li Jing;Wang Yingcong2.Semantic QoS Ontology and Semantic Service Ranking Approach for IoT services [J], Nwe Nwe Htay Win;Jian-Min Bao;Gang Cui;Purevsuren Dalaijargal3.Web Services在RFID系统中的应用综述 [J], 赵毅强;曾隽芳4.Semantic Web Services架构 [J], 张娜5.The Application of Ontology in Semantic Discovery for GeoData Web Service [J], Mingwu Guo因版权原因,仅展示原文概要,查看原文内容请购买。
当前本体编辑工具的分析与研究
-761-1引言本体[1](ontology )是对一个共享的概念化的形式、显式的规格说明。
“概念化”指的是世界上某些现象的抽象模型,该模型能够识别现象的相关概念。
“显式的”指所使用概念的类型及这些概念在使用上的约束要显式定义。
“形式的”指本体是机器可以处理的。
“共享”说明本体表明的是达成一致的知识,也就是说,它并不是局限于某几个个体而是被整体所接受。
在Web 创始人Tim Berners-Lee 提出的语义Web [2]的七层体系结构(即为URI 和Unicode 层、XML+NS+XML Schema 层、RDF 和RDF Schema 层、Ontology Vocabulary 层、Logical 层、Proof 层及Trust 层)中,本体在第4层,它的目标[3]是捕获相关领域的知识,提供对该领域知识的共同理解,确定该领域内共同认可的词汇,并给出这些词汇(术语)和词汇间相互关系的明确定义,通过概念之间的关系来描述概念的语义。
本体是语义Web 的关键使能(enabling )技术,是解决语义层次上Web 信息共享和交换的基础,因此能否高效地建立和编辑本体已成为语义Web 上的一项重要的任务。
2本体语言本体的具体表示需要用描述语言来实现。
对于Web 上的应用程序而言,需要通用的标准语言来表示本体,以避免在各种描述语言之间的转换。
由于XML 已被认为是Web 上数据表示的标准语言,因此,一些研究人员开发了基于XML 语法的Web 本体描述语言。
这些语言包括RDF (S )、OIL 、DAML+OIL 、OWL 等。
RDF [4](Resource Description Framework )是对象(资源)数据模型及其关系,这些模型可用XML 语法表示;RDF Schema 是描述RDF 资源属性和类的词汇表,这些属性和类的通用层次关系带有语义。
OIL (Ontology Interchange Language )是RDF(S )的扩充,它具有更为丰富的模型构造元素。
从文本自动构建OWL本体的研究.kdh
2010年2月刊人工智能与识别技术信息与电脑China Computer&Communication一、背景介绍近年来,“万维网之父”Berners-Lee 提出的语义网[1]的理念得到越来越广泛的认同和重视。
在语义网层次结构中本体(Ontology )处在核心层位置。
“本体是概念模型的明确的规范说明”,通过本体描述的概念间语义关系,可以进行语义上的逻辑推理从而实现语义检索。
当前的本体基本上是基于专家或词典手工构建的,自动化程度很低,繁琐而耗时,难以随着人类认知结构的发展对本体库进行自动更新。
因而,本体的自动化构建是个迫在眉睫的难题。
要实现本体的构建,关键在于找出概念及概念间的关系。
哲学中,概念被理解为由外延和内涵两个部分所组成的思想单元。
其中概念的外延被理解为属于这个概念的所有对象的集合,而内涵被认为是所有这些对象所共有的属性集。
德国Wille R 教授提出的形式概念分析(Formal Concept Analysis) 方法就是通过找出对象间的共有属性来确定对象之间的关系[2]。
目前国际上利用FCA 方法进行本体学习的研究还处在探索阶段,通过FCA 方法得到的是一种称为概念格的数据结构,它的层次结构揭示了概念之间的泛化与例化关系,但与本体仍有一定差距。
本文在前人的基础上通过研究,实践了一种借助FCA 方法从文本中自动提取概念及其语义关系并最终生成国际通用的OWL 本体的方法,从而显著提高本体构建的自动化程度。
以上介绍了本文研究的内容、背景和现状;接下来,第二节将介绍形式背景以及形式概念分析方法;第三节分析从形式概念分析方法得到的概念格的结构特点及OWL 语言的语义要素,给出它们之间的转化算法;第四节将得到的OWL 本体导入Protégé进行推理验证和修正,得到语义丰富的通用本体。
最后,第五节总结本文所做工作,提出下一步改进方向。
二、从形式背景到概念格2.1 形式背景介绍研究形式概念分析(FCA),通常要从形式背景(如:图1左边)这一基本概念开始[3]。
语义网络与语义网
Ontolo了最终的失败
Ontology与语义web
• 1990年Tim.Berners-Lee发明了互联网上的超 文本系统,使网络互连技术用于人们的信息交流与 共享,促进了互联网的发展。 • 超文本系统没有对信息的含义进行描述, 而人们 真正关心的是信息的内容,也就是互联网上的文本、 图片等资源所包含的意义。 • 2000 年 的 世 界 XML (Extensible Markup Language) 大 会 上 , 蒂 姆 . 伯 纳 斯 . 李 做 了 题 为 SemanticWeb的演讲,对语义Web的概念进行了 解释,并提出了语义Web的体系结构。
命题语义网络
命题语义网络
• 引入与、或节点,表达更为复杂的含义 • 例:留下脚印一串串,有的深,有的浅,有的直, 有的弯
谓词语义网络
• Hendrix提出网络分块化技术,以解决谓词 中的变量和量词表示 • 在用语义网络表示一个复杂命题时,可以 把这个复杂命题拆成多个子命题,每个子 命题用—个简单语义网络表示,称为一个 空间.复杂命题构成大空间,子命题构成 子空间,它本身又可以看作是大空间中的 一个节点.子空间可以层层嵌套,也可以 用弧互相连接.
Ontology与语义web
• Ontology在SemanticWeb技 术中居核心地位 • Ontology最早是哲学中的基本概念,研 究“being”之所以为“being”的理 论 • 把“本体论”引入人工智能领域,被用作概念化 的本质表示,也就在正规的形式且是机器可读的 方式下定义相关领域术语和关系,以及利用这些 术语和关系构成该领域的规则的集合 • Ontology中定义的概念在各个专门领域 的实际运用 ( 实用数据 ) 与Ontology一起 构成Se manticWeb的基础
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Ontology Learning for the Semantic WebAlexander Maedche and Steffen StaabInstitute AIFB,D-76128Karlsruhe,Germanyhttp://www.aifb.uni-karlsruhe.de/WBSandOntoprise GmbH,Haid-und-Neu-Strasse7,76131Karlsruhe,GermanyWord Count:5541AbstractThe Semantic Web relies heavily on the formal ontologies that structure underlying data for the purpose of comprehensive and transportable machine understanding.Therefore,the successof the Semantic Web depends strongly on the proliferation of ontologies,which requires fast andeasy engineering of ontologies and avoidance of a knowledge acquisition bottleneck.Ontology Learning greatly facilitates the construction of ontologies by the ontology engineer.The vision of ontology learning that we propose here includes a number of complementary dis-ciplines that feed on different types of unstructured,semi-structured and fully structured data inorder to support a semi-automatic,cooperative ontology engineering process.Our ontology learn-ing framework proceeds through ontology import,extraction,pruning,refinement,and evaluationgiving the ontology engineer a wealth of coordinated tools for ontology modeling.Besides ofthe general framework and architecture,we show in this paper some exemplary techniques in theontology learning cycle that we have implemented in our ontology learning environment,Text-To-Onto,such as ontology learning from free text,from dictionaries,or from legacy ontologies,andrefer to some others that need to complement the complete architecture,such as reverse engineer-ing of ontologies from database schemata or learning from XML documents.Ontologies for the Semantic WebConceptual structures that define an underlying ontology are germane to the idea of machine process-able data on the Semantic Web.Ontologies are(meta)data schemas,providing a controlled vocabulary of concepts,each with an explicitly defined and machine processable semantics.By defining shared and common domain theories,ontologies help both people and machines to communicate concisely,supporting the exchange of semantics and not only syntax.Hence,the cheap and fast construction of domain-specific ontologies is crucial for the success and the proliferation of the Semantic Web.Though ontology engineering tools have become mature over the last decade(cf.[2]),the manual acquisition of ontologies still remains a tedious,cumbersome task resulting easily in a knowledge acquisition bottleneck.Having developed our ontology engineering workbench,OntoEdit,we had to face exactly this issue,in particular we were given questions likeCan you develop an ontology fast?(time)Is it difficult to build an ontology?(difficulty)How do you know that you’ve got the ontology right?(confidence)In fact,these problems on time,difficulty and confidence that we ended up with were similar to what knowledge engineers had dealt with over the last two decades when they elaborated on method-ologies for knowledge acquisition or workbenches for defining knowledge bases.A method that proved extremely beneficial for the knowledge acquisition task was the integration of knowledge ac-quisition with machine learning techniques[12].The drawback of these approaches,e.g.the work described in[6],however,was their rather strong focus on structured knowledge or data bases,from which they induced their rules.In contrast,in the Web environment that we encounter when building Web ontologies,the struc-tured knowledge or data base is rather the exception than the norm.Hence,intelligent means for an ontology engineer takes on a different meaning than the—very seminal—integration architectures for more conventional knowledge acquisition[1].Our notion of Ontology Learning aims at the integration of a multitude of disciplines in order to facilitate the construction of ontologies,in particular machine learning.Because the fully auto-matic acquisition of knowledge by machines remains in the distant future,we consider the process of ontology learning as semi-automatic with human intervention,adopting the paradigm of balanced cooperative modeling[5]for the construction of ontologies for the Semantic Web.This objective in mind,we have built an architecture that combines knowledge acquisition with machine learning, feeding on the resources that we nowadaysfind on the syntactic Web,viz.free text,semi-structured text,schema definitions(DTDs),etc.Thereby,modules in our framework serve different steps in the engineering cycle,which here consists of the followingfive steps(cf.Figure1):First,existing ontologies are imported and reused by merging existing structures or defining mapping rules between existing structures and the ontology to be established.For instance,[9]de-scribe how ontological structures contained in Cyc are used in order to facilitate the construction of a domain-specific ontology.Second,in the ontology extraction phase major parts of the target ontol-ogy are modeled with learning support feeding from web documents.Third,this rough outline of the target ontology needs to be pruned in order to better adjust the ontology to its prime purpose.Fourth,ontology refinement profits from the given domain ontology,but completes the ontology at a fine granularity (also in contrast to extraction).Fifth,the prime target application serves as a measure for validating the resulting ontology [11].Finally,one may revolve again in this cycle,e.g.for including new domains into the constructed ontology or for maintaining and updating its scope.Figure 1:Ontology Learning process stepsAn Architecture for Ontology LearningGiven the task of constructing and maintaining an ontology for a Semantic Web application,e.g.for an ontology-based knowledge portal that we have been dealing with(cf.[10]),we have produced a wish list of what kind of support we would fancy.Ontology Engineering Workbench OntoEdit.As core to our approach we have built a graphi-cal user interface to support the ontology engineering process manually performed by the ontology engineer.Here,we offer sophisticated graphical means for manual modeling and refining thefinal ontology.Different views are offered to the user targeting the epistemological level rather than a par-ticular representation language.However,the ontological structures built there may be exported to standard Semantic Web representation languages,such as OIL and DAML-ONT,as well as our own F-Logic based extensions of RDF(S).In addition,executable representations for constraint check-ing and application debugging can be generated and then accessed via SilRi1,our F-Logic inference engine,that is directly connected with OntoEdit.The sophisticated ontology engineering tools we knew,e.g.the Prot´e g´e modeling environment for knowledge-based systems[2],would offer capabilities roughly comparable to OntoEdit.However, given the task of constructing a knowledge portal,we found that there was this large conceptual bridge between the ontology engineering tool and the input(often legacy data),such as Web documents, Web document schemata,databases on the Web,and Web ontologies,which ultimately determined the target ontology.Into this void we have positioned new components of our ontology learning architecture(cf.Figure2).The new components support the ontology engineer in importing existing ontology primitives,extracting new ones,pruning given ones,or refining with additional ontology primitives.In our case,the ontology primitives comprise:a set of strings that describe lexical entries for concepts and relations;a set of concepts2—;a taxonomy of concepts with multiple inheritance(heterarchy);a set of non-taxonomic relations——described by their domain and range restrictions;1/—then download area.2Concepts in our framework are roughly akin to synsets in WordNet[4].a heterarchy of relations,i.e.a set of taxonomic relations;relationsand that relate concepts and relations with their lexical entries,respectively;and,finally,a set of axioms that describe additional constraints on the ontology and allow to make implicit facts explicit [10].OntoEditInference Engine(s)Crawlcorpus DTD DTDManagement Import semi-structured schema Figure 2:Architecture for Learning Ontologies for the Semantic WebThis structure corresponds closely to RDFS,the one exception is the explicit consideration of lexical entries.The separation of concept reference and concept denotation,which may be easily ex-pressed in RDF,allows to provide very domain-specific ontologies without incurring an instantaneous conflict when merging ontologies —a standard request in the Semantic Web.For instance,the lexical entry “school”in one ontology may refer to a building in ontology A,but to an organization in ontol-ogy B,or to both in ontology C.Also in ontology A the concept refered to in English by“school”and “school building”may be referred to in German by“Schule”and“Schulgeb¨a ude”.Ontology learning relies on ontology structures given along these lines and on input data as de-scribed above in order to propose new knowledge about reasonably interesting concepts,relations, lexical entries,or about links between these entities—proposing the addition,the deletion,or the merging of some of them.The results of the ontology learning process are presented to the ontology engineer by the graphical result set representation(cf.Figure4for an example of how extracted prop-erties may be presented).The ontology engineer may then browse the results and decide to follow, delete,or modify the proposals in accordance to the purpose of her task.Components for Learning OntologiesIntegrating the considerations from above into a coherent generic architecture for extracting and main-taining ontologies from data on the Web we have identified several core components.There are,(i), a generic management component dealing with delegation of tasks and constituting the infrastructure backbone,(ii),a resource processing component working on input data from the Web including,in particular,a natural language processing system,(iii),an algorithm library working on the output of the resource processing component as well as the ontology structures sketched above and return-ing result sets also mentioned above and,(iv),the graphical user interface for ontology engineering, OntoEdit.Management component.The ontology engineer uses the management component to select input data,i.e.relevant resources such as HTML&XML documents,document type definitions,databases, or existing ontologies that are exploited in the further discovery process.Secondly,using the man-agement component,the ontology engineer also chooses among a set of resource processing methods available at the resource processing component and among a set of algorithms available in the algo-rithm library.Furthermore,the management component even supports the ontology engineer in discovering task-relevant legacy data,e.g.an ontology-based crawler gathers HTML documents that are relevant to a given core ontology and an RDF crawler follows URIs(i.e.,unique identifiers in XML/RDF)that are also URLs in order to cover parts of the so far tiny,but growing Semantic Web.Resource processing component.Resource processing strategies differ depending on the type of input data made available:HTML documents may be indexed and reduced to free text.Semi-structured documents,like dictionaries,may be transformed into a predefined relational structure.Semi-structured and structured schema data(like DTD’s,structured database schemata,and existing ontologies)are handeled following different strategies for import as described later in this paper.For processing free natural text our system accesses the natural language processing system SMES(Saarbr¨u cken Message Extraction System),a shallow text processor for German(cf.[7]).SMES comprises a tokenizer based on regular expressions,a lexical analysis componentincluding various word lexicons,a morphological analysis module,a named entity recognizer,a part-of-speech tagger and a chunk parser.Afterfirst preprocessing according to one of these or similar strategies,the resource processing module transforms the data into an algorithm-specific relational representation.Algorithm Library.As described above an ontology may be described by a number of sets of concepts,relations,lexical entries,and links between these entities.An existing ontology definition (including)may be acquired using various algorithms working on this def-inition and the preprocessed input data.While specific algorithms may greatly vary from one type of input to the next,there is also considerable overlap concerning underlying learning approaches like association rules,formal concept analysis,or clustering.Hence,we may reuse algorithms from the library for acquiring different parts of the ontology definition.Subsequently,we introduce some of these algorithms available in our implementation.In general, we use a multi-strategy learning and result combination approach,i.e.each algorithm that is plugged into the library generates normalized results that adhere to the ontology structures sketched above and that may be combined into a coherent ontology definition.Import&ReuseGiven our experiences in medicine,telecommunication,and insurance,we expect that for almost any commercially significant domain there are some kind of domain conceptualizations available. Thus,we need mechanisms and strategies to import&reuse domain conceptualizations from existing (schema)structures.Thereby,the conceptualizations may be recovered,e.g.,from legacy database schemata,document-type definitions(DTDs),or from existing ontologies that conceptualize some relevant part of the target ontology.In thefirst part of the import&reuse step,the schema structures are identified and their general content need to be discussed with domain experts.Each of these knowledge sources must be im-ported separately.Import may be performed manually—which may include the manual definition of transformation rules.Alternatively,reverse engineering tools,such as exist for recovering extended entity-relationship diagrams from the SQL description of a given database(cf.reference[19,11]in survey,Table1),may facilitate the recovery of conceptual structures.In the second part of the import&reuse step,imported conceptual structures need to be merged or aligned in order to constitute a single common ground from which to take-off into the subsequent ontology learning phases of extracting,pruning and refining.While the general research issue con-cerning merging and aligning is still an open problem,recent proposals(e.g.,[8])have shown how to improve the manual process of merging/aligning.Existing methods for merging/aligning mostly rely on matching heuristics for proposing the merge of concepts and similar knowledge-base operations. Our current research also integrates mechanisms that use a application data oriented,bottom-up ap-proach.For instance,formal concept analysis allows to discover patterns between application data on the one hand and the usage of concepts and relations and the semantics given by their heterarchies on the other hand in a formally concise way(cf.reference[7]in survey,Table1,on formal concept analysis).Overall,the import and reuse step in ontology learning seems to be the one that is the hardest to generalize.The task may remind vaguely of the general problems with data warehousing adding, however,challenging problems of its own.Extracting OntologiesIn the ontology extraction phase of the ontology learning process,major parts,i.e.the complete on-tology or large chunks reflecting a new subdomain of the ontology,are modeled with learning support exploiting various types of(Web)sources.Thereby,ontology learning techniques partially rely on given ontology parts.Thus,we here encounter an iterative model where previous revisions through the ontology learning cycle may propel subsequent ones and more sophisticated algorithms may work on structures proposed by more straightforward ones before.Describing this phase,we sketch some of the techniques and algorithms that have been embedded in our framework and implemented in our ontology learning environment Text-To-Onto(cf.Figure3). Doing so,we cover a very substantial part of the overall ontology learning task in the extraction phase.Text-To-Onto proposes many different ontology components,which we have described above (i.e.),to the ontology engineer feeding on several types of input.Figure3:Screenshot of our Ontology Learning Workbench Text-To-OntoLexical Entry&Concept Extraction.This technique is one of the baseline methods applied in our framework for acquiring lexical entries with corresponding concepts.In Text-To-Onto,web documents are morphologically processed,including the treatment of multi-word terms such as“database reverse engineering”by N-grams,a simple statistics means.Based on this text preprocessing,term extraction techniques,which are based on(weighted)statistical frequencies,are applied in order to propose new lexical entries for.Often,the ontology engineer follows the proposal by the lexical entry&concept extraction mech-anism and includes a new lexical entry in the ontology.Because the new lexical entry comes without an associated concept,the ontology engineer must then decide(possibly with help from further pro-cessing)whether to introduce a new concept or link the new lexical entry to an existing concept.Hierarchical Concept Clustering.Given a lexicon and a set of concepts,one major next step is the taxonomic classification of concepts.One generally applicable method with to this regard is hierarchical clustering.Hierarchical clustering exploits the similarity of items in order to propose a hierarchy of item categories.The similarity measure is defined on the properties of items.Given the task of extracting a hierarchy from natural language text,adjacency of terms or syntacti-cal relationships between terms are two properties that yield considerable descriptive power to induce the semantic hierarchy of concepts related to these terms.A sophisticated example for hierarchical clustering is given by Faure&Nedellec(cf.reference[6]in survey,Table1):They present a cooperative machine learning system,ASIUM,which acquires taxonomic relations and subcategorization frames of verbs based on syntactic input.The ASIUM system hierarchically clusters nouns based on the verbs that they are syntactically related with and vice versa.Thus,they cooperatively extend the lexicon,the set of concepts,and the concept heterarchy ().Dictionary Parsing.Machine-readable dictionaries(MRD)are frequently available for many do-mains.Though their internal structure is free text to a large extent,there are comparatively few patterns that are used to give text definitions.Hence,MRDs exhibit a large degree of regularity that may be exploited for extracting a domain conceptualization and proposing it to the ontology engineer.Text-To-Onto has been used to generate a taxonomy of concepts from a machine-readable dictio-nary of an insurance company(cf.reference[13]in survey,Table1).Likewise to term extraction from free text morphological processing is applied,this time however complementing several pattern-matching heuristics.For example the dictionary contained the following entry:Automatic Debit Transfer:Electronic service arising from a debit authorization of theYellow Account holder for a recipient to debit bills that fall due direct from the account..Several heuristics were applied to the morphologically analyzed definitions.For instance,one simple heuristic relates the definition term,here“automatic debit transfer”,with thefirst noun phrase occurring in the definition,here“electronic service”.Their corresponding concepts are linked in the heterarchy:(AUTOMATIC DEBIT TRANSFER,ELECTRONIC SERVICE).Applying this heuristic iteratively,one may propose large parts of the target ontology,more precisely and to the ontology engineer.In fact,because verbs tend to be modeled as relations,(and the linkage between and)may be extended by this way,too.Association Rules.Association rule learning algorithms are typically used for prototypical applica-tions of data mining,likefinding associations that occur between items,e.g.supermarket products,in a set of transactions,e.g.customers’purchases.The generalized association rule learning algorithm extends its baseline by aiming at descriptions at the appropriate level of the taxonomy,e.g.“snacks are purchased together with drinks”rather than“chips are purchased with beer”and“peanuts are purchased with soda”.In Text-To-Onto(cf.reference[14]in survey,Table1)we use a modification of the generalized association rule learning algorithm for discovering properties between classes.A given class hierarchy serves as background knowledge.Pairs of syntactically related classes(e.g.pair(FESTIVAL,ISLAND) describing the head-modifier relationship contained in the sentence“The festival on Usedom3attracts tourists from all over the world.”)are given as input to the algorithm.The algorithm generates asso-ciation rules comparing the relevance of different rules while climbing up and/or down the taxonomy. The appearingly most relevant binary rules are proposed to the ontology engineer for modeling rela-tions into the ontology,thus extending.As the number of generated rules is typically high,we offer various modes of interaction.For example,it is possible to restrict the number of suggested relations by defining so-called restriction classes that have to participate in the relations that are extracted.Another way of focusing is the flexible enabling/disabling of the use of taxonomic knowledge for extracting relations.3Usedom is an island located in north-east of Germany in the Baltic Sea.Results are presented offering various views onto the results as depicted in Figure4.A generalized relation that may be induced by the partially given example data above may be the PROPERTY(EVENT,AREA), which may be named by the ontology engineer as LOCATED I N,viz.EVENTS are located in an AREA (thus extending and).The user may add the extracted relations to the ontology by drag-and-drop.To explore and determine the right aggregation level of adding a relation to the ontology,the user may browse the hierarchy view on extracted properties as given in the left part of Figure4.This view may also support the ontology engineer in defining appropriate SUB P ROPERTY O F relations between properties,such as SUB P ROPERTY O F(HAS D OUBLE R OOM,HAS R OOM)(thereby extending).Figure4:Result Presentation in Text-To-OntoPruning the OntologyA common theme of modeling in various disciplines is the balance between completeness and scarcityof the domain model.It is a widely held belief that targeting completeness for the domain model onthe one hand appears to be practically inmanagable and computationally intractable,and targeting the scarcest model on the other hand is overly limiting with regard to expressiveness.Hence,what we strive for is the balance between these two,which is really working.We aim at a model that capturesa rich conceptualization of the target domain,but that excludes parts that are out of its focus.The import&reuse of ontologies as well as the extraction of ontologies considerably pull the lever ofthe scale into the imbalance where out-of-focus concepts reign.Therefore,we pursue the appropriate diminishing of the ontology in the pruning phase.There are at least two dimensions to look at the problem of pruning.First,one needs to clarify how the pruning of particular parts of the ontology(e.g.,the removal of a concept or a relation)affects the rest.For instance,Peterson et.al.[9]have described strategies that leave the user with a coher-ent ontology(i.e.no dangling or broken links).Second,one may consider strategies for proposing ontology items that should be either kept or pruned.We have investigated several mechanisms for generating proposals from application data Given a set of application-specific documents there are several strategies for pruning the ontology.They are based on absolute or relative counts of frequency of terms(cf.reference[13]in survey,Table1).Refining the OntologyRefining plays a similar role as extracting.Their difference exists rather on a sliding scale than by a clear-cut distinction.While extracting serves mostly for cooperative modeling of the overall ontology (or at least of very significant chunks of it),the refinement phase is aboutfine tuning the target ontol-ogy and the support of its evolving nature.The refinement phase may use data that comes from the concrete Semantic Web application,e.g.logfiles of user queries or generic user data.Adapting and refining the ontology with respect to user requirements plays a major role for the acceptance of the application and its further development.In principle,the same algorithms may be used for extraction as for refinement.However,during refinement one must consider in detail the existing ontology and the existing connections into the ontology,while extraction works more often than not practically from scratch.A prototypical approach for refinement(though not for extraction!)has been presented by Hahn &Schnattinger(cf.reference[8]in survey,Table1).They have introduced a methodology for au-tomating the maintenance of domain-specific taxonomies.An ontology is incrementally updated as new concepts are acquired from text.The acquisition process is centered around the linguistic and conceptual“quality”of various forms of evidence underlying the generation and refinement of con-cept hypothesis.In particular they consider semantic conflicts and analogous semantic structures from the knowledge base into the ontology in order to determine the quality of a particular proposal.Thus, they extend an existing ontology with new lexical entries for,new concepts for and new relationsfor.ChallengesOntology Learning may add significant leverage to the Semantic Web,because it propels the construc-tion of domain ontologies,which are needed fastly and cheaply for the Semantic Web to succeed.We have presented a comprehensive framework for Ontology Learning that crosses the boundaries of single disciplines,touching on a number of challenges.Table1gives a survey of what types of tech-niques should be included in a full-fledged ontology learning and engineering environment.The good news however is that one does not need perfect or optimal support for cooperative modeling of ontolo-gies.At least according to our experience“cheap”methods in an integrated environment may yield tremendous help for the ontology engineer.While a number of problems remain with the single disciplines,some more challenges come up regarding the particular problem of Ontology Learning for the Semantic Web.First,with the XML-based namespace mechanisms the notion of an ontology with well-defined boundaries,e.g. only definitions that are in onefile,will disappear.Rather,the Semantic Web may yield an“amoeba-like”structure regarding ontology boundaries,because ontologies refer to each other and import each other(cf.e.g.the DAML-ONT primitive import).However,it is not yet clear how the semantics of these structures will look like.In light of these facts the importance of methods like ontology pruning and crawling of ontologies will drastically increase still.Second,we have so far restricted our attention in ontology learning to the conceptual structures that are(almost)contained in RDF(S) proper.Additional semantic layers on top of RDF(e.g.future OIL or DAML-ONT with axioms,) will require new means for improved ontology engineering with axioms,too!Acknowledgements.We thank our students,Dirk Wenke and Raphael V olz,for work at OntoEdit and Text-To-Onto.Research for this paper was partiallyfinanced by Ontoprise GmbH,Karlsruhe, Germany,by US Air Force in the DARPA DAML project“OntoAgents”,by European Union in the IST-1999-10132project“On-To-Knowledge”,and by German BMBF in the project“GETESS”(01IN901C0).。