高效的XML关键字查询改写和结果生成技术
基于LCA的高效XML关键字检索算法
H i h。f i i n g - f c e tXM L e wo d Re r e a g r t e K y r t i v lAl o ihm s d o Ba e n LCA
HAN e g M n . CHEN n W ANG n Qu , Pe g
( c o fCo p trS in ea dTe h lg S ho lo m u e ce c n c noo y,No t we tr ltc ia ie st r h se n Poy ehnc lUnv riy,Xin 7 01 9 ' 1 2 ,Chi ) a na
文献标识码: A
中图 分类号:P1 T31
基 于 L A 的 高 效 XML 关键 字检 索算 法 C
韩 萌, 陈 群, 王 鹏
( 西北工业大学计算机 学院 , 西安 7 0 2 ) 1 1 9
摘 要 : E C 的 语 义 为 基 础 , 析 E A 的诸 多性 质 , 出 EL A 结 果 查 找 算 法 复 杂度 高 的原 因。 在 其 基 础 上提 出 B 以 LA 分 I C 给 C HFA 算 法 , 包括 2种
界 面 , 得 用户 在 不 熟 悉 X 使 ML 文 档 结 构 和 复 杂 的 查 询 语 言
孙 节 点 中 包 含 的 所 有 关 键 字 后 还 能包 含所 有关 键 字 至少 一 次 的结 点 集 合 。例 如 , 图 1中 , 询 Q一 {XML” “ h n ) 在 查 ‘ ‘ , C e ” 的
as n r d c sa lo ito u e n XM L y r ere lago ih ,Bo t m- p Hirr hc lFi e igAlo ih ( ke wo dr tiva l rt m to u e a c ia l rn g rtm BH FA) nt eb sso h r p risa o e t ,o h a i ft ep o e t b v . e
高效的XML关键字查询改写和结果生成技术
高效的XML关键字查询改写和结果生成技术
黄静;陆嘉恒;孟小峰
【期刊名称】《计算机研究与发展》
【年(卷),期】2010(047)005
【摘要】用户使用关键字查询时可能不能准确地表达他们的意图,即使用户正确地表达了查询意图,查询引擎也可能不能准确地返回查询结果.针对这一问题,重点研究了在XML关键字查询中如何进行有效的查询改写并生成有意义的结果.提出4种查询改写操作和查询改写代价的概念,给出了动态规划的方法计算查询改写代价.为了找出最优的查询改写,给出了基于栈的查询改写和结果生成算法,并提出了基于划分的优化算法.最后通过丰富的实验对提出的方法进行了验证.
【总页数】8页(P841-848)
【作者】黄静;陆嘉恒;孟小峰
【作者单位】中国人民大学信息学院,北京,100872;中国人民大学信息学院,北京,100872;中国人民大学信息学院,北京,100872
【正文语种】中文
【中图分类】TP391
【相关文献】
1.XML中支持top-k的关键字查询方法研究 [J], 崔婉秋;李昕;孟祥福;崔岩;王大伟
2.面向XML关键字查询的高效RKN求解策略 [J], 陈子阳;王璿;汤显
3.FastMatch:一种高效的XML关键字查询算法 [J], 崔健;周军锋;郭景峰
4.一种基于区间预留编码的XML关键字查询算法 [J], 魏东平; 罗丹
5.PrList:一种高效的不确定XML关键字查询算法 [J], 张晓琳;苏龙超;韩雨童;刘立新
因版权原因,仅展示原文概要,查看原文内容请购买。
一种基于关键字的xml文档查询算法
一种基于关键字的xml文档查询算法一种基于关键字的XML文档查询算法是一种快速查找XML文档中特定内容的方法。
它利用关键字来查找XML文档中含有特定内容的节点,从而帮助用户快速定位相应的内容。
XML(Extensible Markup Language)是一种可扩展的标记语言,它使用元素和属性来表示数据。
XML文档由XML 元素(也称为节点)和属性组成,每个节点可以包含其他节点和属性,因此XML文档可以形成复杂的树形结构。
由于XML文档的结构组织得更好,因此在XML文档中进行查找会比在普通文本文档中更加方便高效。
基于关键字的XML文档查询算法利用关键字来识别XML文档中含有特定内容的节点。
该算法的基本思想是:首先将关键字分割为独立的单词,然后对XML文档进行遍历,当发现节点属性中含有关键字时,就将该节点的内容添加到结果列表中。
最后,将查询到的结果返回给用户。
基于关键字的XML文档查询算法的实现步骤如下:1、确定关键字:首先需要确定关键字,即需要查询的内容。
2、分割关键字:将关键字分割为独立的单词。
3、遍历XML文档:将XML文档进行遍历,以检索特定内容。
4、判断关键字:检查节点属性中是否存在关键字,如果存在则将该节点的内容加入到结果列表中。
5、返回结果:将查询到的结果返回给用户。
基于关键字的XML文档查询算法有很多优点:首先,比起其他文档查询算法,它的查询速度更快,而且可以查询出更多的信息;其次,它可以查找出XML文档中任意层级的节点,并能从多个节点中检索出相关内容;最后,它支持XPath查询,可以查询出XML文档中满足XPath条件的节点信息。
总之,基于关键字的XML文档查询算法是一种非常有效的查找XML文档中特定内容的方法,它可以让用户快速定位到相应的内容,并且具备良好的查询效率和准确性。
xml格式讲解
xml格式讲解摘要:1.XML简介2.XML的基本语法3.标签和属性4.解析XML5.XML的应用场景正文:一、XML简介XML(可扩展标记语言)是一种用于描述数据结构和数据的标记语言。
它源于1998年由万维网联盟(W3C)推出的标准。
XML的设计目标是简化数据的共享和传输,使得不同的系统和平台能够互相理解数据。
与HTML相比,XML更加灵活和可扩展,适用于各种类型的数据。
二、XML的基本语法1.声明:XML文档的开始部分需要有一个声明,示例如下:```<?xml version="1.0" encoding="UTF-8"?>```2.元素:XML文档由多个嵌套的元素组成。
每个元素由开始标签、结束标签和中间的内容组成。
例如:```<root><child1>内容1</child1><child2>内容2</child2></root>```3.命名规则:XML元素名称必须遵循以下规则:- 名称以字母或下划线开头(首字母大写或小写均可);- 名称中间不能有空格;- 名称中只能包含字母、数字、连字符、下划线和点号;- 名称区分大小写;- 顶级元素(如`<root>`)必须使用名词。
三、标签和属性1.标签:XML标签用于标识文档中的不同部分。
标签可以分为开始标签(如`<root>`)和结束标签(如`</root>`)。
2.属性:XML元素可以使用属性来提供附加信息。
属性位于开始标签内,如下所示:```<root attr1="value1" attr2="value2">```3.属性值:XML属性值可以使用引号(单引号或双引号)括起来。
如果属性值中包含特殊字符,可以使用CData段(如下所示)或实体引用。
XML关键词检索算法的研究与实现的开题报告
XML关键词检索算法的研究与实现的开题报告一、选题背景和意义随着Internet的迅猛发展, Web服务得到了广泛的应用, 其中以XML(eXtensible Markup Language)语言为基础的Web服务尤为重要。
XML是一种用于描述数据的标记语言, 它拥有强大的灵活性、可扩展性和可读性, 成为互联网中最为流行的数据交换格式之一。
然而, 在XML文档中, 包含了大量的信息, 如何快速、准确地检索出与用户需要相匹配的信息, 是XML文档检索研究的关键问题。
目前, 已经有许多关于XML文档检索的研究, 其中以基于关键词检索的方法为主流。
因此, 本文旨在研究XML关键词检索算法, 并将其实现为一个实用的检索系统, 以方便用户快速、准确地检索出所需信息。
二、研究内容1.分析当前XML文档检索的研究现状, 包括国内外的研究进展和存在的问题。
2.对XML文档中的节点进行索引, 提高检索效率。
3.设计并实现了基于关键词的XML文档检索算法, 针对多种检索关键词的情况进行优化。
4.设计并实现了一个实用的XML文档检索系统, 通过软件界面进行检索操作, 对检索结果进行展示。
5.对检索效率和精度进行测试, 优化慢查询和高并发请求, 提高系统的性能和可靠性。
三、研究方法和实施步骤1.综合文献, 系统性地分析当前XML文档检索的研究现状以及存在的问题。
2.设计并实现索引算法, 将XML文档中的节点进行索引。
3.设计并实现关键词检索算法, 实现基于关键词的XML文档检索。
4.设计并实现XML文档检索系统, 包括用户界面、后端处理和数据存储等组成部分。
5.进行系统的性能测试和异常处理, 对系统进行优化。
四、预期结果和意义本文将设计并实现一个基于关键词的XML文档检索系统, 解决XML文档检索的瓶颈问题, 并为用户提供可靠、快速、准确地检索服务。
五、进度安排1.前期研究(2个月), 包括文献综述和需求分析等阶段。
2.系统设计和实现(5个月), 包括索引算法、关键词检索算法和XML文档检索系统的设计与实现。
高效的xml 关键字查询改写和结果生成技术pdf
收稿日期:基金项目:国家自然科学基金项目(60833005, 60573091), 国家863 计划(2007AA01Z155, 2009AA011904, 2009AA01Z133),教育部博士点基金项目(200800020002),教育部重点项目(109004)本文通讯作者:陆嘉恒(jiahenglu@)高效的XML 关键字查询改写和结果生成技术黄静 陆嘉恒 孟小峰(中国人民大学信息学院 北京 100872) (huangjingruc@)Efficient XML Keyword Query Refinement with Meaningful Results GenerationHuang Jing, Lu Jiaheng, and Meng Xiaofeng(School of Information, Renmin University of China, Beijing 100872)Abstract Keyword search method provides users with a friendly way to query XML data, but a user’s keyword query may often be an imperfect description of their intention. Even when the information need is well described, a search engine may not be able to return the results matching the query as stated. The task of refining the user’s original query is first defined to achieve better result quality as the problem of keyword query refinement in XML keyword search, and guidelines are designed to decide whether query refinement is necessary. Four refinement operations are defined, namely term deletion, merging, split and substitution. Since there may be more then one query refinement candidates, proposes the definition of refinement cost, which is used as a measure of semantic distance between the original query and refined query, and also a dynamic programming solution to compute refinement cost. In order to achieve the goal of finding the best refined queries and generate their associated results within a one-time node list scan, a stack-based algorithm is proposed, followed by a generalized partition-based optimization, which improves the efficiency a lot. Finally, extensive experiments have been done to show efficiency and effectiveness of the query refinement approach.Keywords XML; Keyword Search; Query Refinement;Query Rewriting; Query Suggestion; SLCA摘要 用户使用关键字查询时,可能不能准确的表达他们的意图,即使用户正确的表达了查询意图,查询引擎也可能不能准确地返回查询结果.针对这一问题,重点研究了在XML 关键字查询中如何进行有效的查询改写并生成有意义的结果.提出四种查询改写操作和查询改写代价的概念,给出了动态规划的方法计算查询改写代价.为了找出最优的查询改写,给出了基于栈的查询改写和结果生成算法,并提出了基于划分的优化算法.最后通过丰富的实验对提出的方法进行了验证.关键词 XML; 关键字查询; 查询改写; 查询重写;查询推荐;SLCA中图法分类号 TP3910引言关键字查询为用户提供了友好便捷的查询方式,如何使用关键字查询从XML 数据中获取所需信息已经成为学术界近期研究的一个热点问题[1-5].这些工作主要研究如何过滤无关的查询结果来提高查准率.本文关注的是另一个方面:当查询没有结果返回或是返回太少结果时,如何通过改写原始查询,使得新的查询获得好的查全率.这种情况是普遍存在于关键字查询中的,由于用户可能不能准确表达查询意图,输入的查询可能存在拼写错误或不相关的词,这样使得某些关键字在文档中找不到匹配的结点,导致没有结果返回.第二十六届中国数据库学术会议论文集:1-7,2009.10Fig. 1 Example XML document 图1 带Dewey编码的XML文档即使用户准确地表达了查询意图,搜索引擎也可能不能准确地返回用户想要的结果,例如,对于图1所示的XML文档,用户输入查询“paper, XML”以查找关于XML的文章,文档中包含inproceedings(0.0.1.0)、inproceedings(0.1.1.0)和article(0.1.1.2)三个解,但由于关键字“paper”在文档中无法找到匹配的结点,以至于没有结果返回.根据文献[6]中的调查统计,用户使用关键字搜索时,大概有10%-15%的查询存在错误,有40%-51%的查询要经过修改才能获得所需的信息.我们将改写用户输入的原始查询来获得更好查询结果的问题定义为查询改写。
php xmlwriter 方法
php xmlwriter 方法PHP XMLWriter 是一个用于生成 XML 文档的类。
它提供了一种简单而灵活的方式来创建和修改 XML 文件。
使用 XMLWriter 类,我们可以轻松地创建一个 XML 文档。
首先,我们需要创建一个 XMLWriter 对象,然后使用它的方法来添加元素、属性和文本内容。
XMLWriter 类的主要方法包括:startDocument、endDocument、startElement、endElement、writeElement、writeAttribute 和writeText。
startDocument 方法用于开始一个 XML 文档,并设置其版本和编码。
endDocument 方法用于结束 XML 文档的编写。
startElement 方法用于开始一个元素节点,并设置其名称。
endElement 方法用于结束当前元素节点的编写。
writeElement 方法用于添加一个完整的元素节点,包括开始和结束标签以及文本内容。
writeAttribute 方法用于添加一个元素节点的属性。
writeText 方法用于添加元素节点的文本内容。
通过这些简单的方法,我们可以灵活地创建和修改 XML 文档。
XMLWriter 类还提供了一些其他的方法,如设置缩进、格式化输出等。
总结一下,PHP XMLWriter 类是一个非常有用的工具,可以帮助我们轻松地创建和修改 XML 文档。
它提供了简单而灵活的方法,使我们可以快速地生成符合标准的 XML 文件。
无论是生成 RSS 订阅源、配置文件还是生成复杂的数据结构,XMLWriter 都是一个强大而实用的工具。
希望本文能够帮助大家更好地理解和使用PHP XMLWriter 类。
xml的作用与功能主治
XML的作用与功能主治1. 简介XML(eXtensible Markup Language)是一种标记语言,用于描述文档结构和数据内容。
它被广泛应用于数据交换、配置文件和Web服务等领域。
本文将介绍XML的作用和功能,以及它在不同领域中的主治能力。
2. XML的作用2.1. 数据交换XML提供了一种结构化的标记语言,可以用于在不同系统之间进行数据交换。
通过定义标签和属性,可以在XML文档中存储和传输数据。
XML数据可以被解析和读取,并用于数据转换或集成系统。
2.2. 配置文件XML还可以用于配置文件的存储和管理。
许多应用程序使用XML格式的配置文件来存储参数和设置。
通过使用标签和属性,可以按照特定的格式组织配置信息,并方便地进行修改和维护。
2.3. Web服务XML在Web服务中扮演了重要的角色。
通过使用XML,可以在不同平台和编程语言之间传递数据。
XML还可以在Web服务中定义消息格式,以实现系统之间的通信和数据交换。
3. XML的功能3.1. 分层结构XML使用标签和元素来组织数据,这种层次结构使得数据能够被清晰地呈现和访问。
不同级别的标签可以表示数据的不同层次结构,从而提供了更好的组织和管理能力。
3.2. 可扩展性XML的可扩展性使得用户可以根据自己的需求定义标签和元素。
这意味着XML可以适应不同的数据结构和应用场景,满足各种需求。
3.3. 高度可读性XML使用文本格式存储数据,可以直接查看和编辑。
相比二进制格式,XML更容易理解和修改。
同时,XML还支持注释和文档类型定义(DTD),提高了文档的可读性和可维护性。
3.4. 数据验证与约束通过使用文档类型定义(DTD)或XML Schema,XML可以对数据进行验证和约束。
这样可以确保数据的有效性和一致性,减少错误和数据不一致的可能性。
3.5. 跨平台和跨语言支持XML是一种平台无关的标记语言,可以在不同操作系统和平台上使用。
同时,由于XML使用文本格式存储数据,所以可以在不同编程语言之间进行交互和处理。
如何用XMLSpy生成高效的代码
如何用XMLSpy生成简洁代码如生成MACS-ATS_FILE.xsd的代码,由于定义Element和ComplexType的不同,导致生成的代码有差别。
自然,生成最少的代码来实现需求是最期待的。
为实现生成最少的代码,有以下几个原则:1.Node中有共同属性的情况下,抽象出一个属性类Attribute类。
其他类继承此类。
如Signal类继承了Attribute类。
2.当确认一个类型要被重用时,每个ComplexType类型对应一个同名字的Element。
例如Head没有被重用,可以不需要对应的Element,而Signal肯定被重用了,就是多处用到这个类型信息,那么就定义一个ComplexType和一个同名的Element。
Element的type选项即为ComplexType的Signal类型。
这样,只要XSD文件中用到Signal的地方都是添加一个Element,而且是引用类型。
注意到属性isRef被选上了。
这样定义自动生成的C++代码就最少,不出出现讨厌的Class2现象。
以下是Demo.xsd文件的截图。
所生成的所有类。
定义Attribute抽象类导致很好的代码复用。
举个例子,以下类时没有根据定义原则生成的,导致出现了较多类和Cobject2的现象。
定义少,但生成的类多。
在编辑后生成的类(同样完成设计要求)。
定义多,但生成的类少。
根据XMLSpy自动生成的代码,分析分析就会发现,其实完全可以通过一个类来处理xml文件,而可以抛开生成的这些类。
一个类XMLNode可以处理所有符合对应XSD文件的XML文件的读写。
--------------------------------------------------HPP文件--------------------------------------------------class XMLNode : public CNode{public:XMLNode() : CNode() {}XMLNode(xercesc::DOMNode* pThisNode) : CNode(pThisNode) {}XMLNode(xercesc::DOMDocument* pDoc) : CNode(pDoc) {}virtual ~XMLNode() {}static EGroupType GetGroupType();void AdjustPrefix();//// string StringValue//void AddStringValue(tstring strNodeName, CSchemaString StringValue);void InsertStringValueAt(tstring strNodeName, CSchemaString StringValue, int iIndex = 0);void ReplaceStringValueAt(tstring strNodeName, CSchemaString StringValue, int iIndex = 0); CSchemaString GetStringValueAt(tstring strNodeName, int iIndex = 0);CSchemaString GetStringValueAtCursor(xercesc::DOMNode* pCurNode);//// boolean BoolValue//void AddBoolValue(tstring strNodeName, CSchemaBoolean BoolValue);void InsertBoolValueAt(tstring strNodeName, CSchemaBoolean BoolValue, int iIndex = 0); void ReplaceBoolValueAt(tstring strNodeName, CSchemaBoolean BoolValue, int iIndex = 0); CSchemaBoolean GetBoolValueAt(tstring strNodeName, int iIndex = 0);CSchemaBoolean GetBoolValueAtCursor(xercesc::DOMNode* pCurNode);//// byte ByteValue//void AddByteValue(tstring strNodeName, CSchemaByte ByteValue);void InsertByteValueAt(tstring strNodeName, CSchemaByte ByteValue, int iIndex = 0);void ReplaceByteValueAt(tstring strNodeName, CSchemaByte ByteValue, int iIndex = 0); CSchemaByte GetByteValueAt(tstring strNodeName, int iIndex = 0);CSchemaByte GetByteValueAtCursor(xercesc::DOMNode* pCurNode);//// int IntValue//void AddIntValue(tstring strNodeName, CSchemaInt IntValue);void InsertIntValueAt(tstring strNodeName, CSchemaInt IntValue, int iIndex = 0);void ReplaceIntValueAt(tstring strNodeName, CSchemaInt IntValue, int iIndex = 0); CSchemaInt GetIntValueAt(tstring strNodeName, int iIndex = 0);CSchemaInt GetIntValueAtCursor(xercesc::DOMNode* pCurNode);//// float FloatValue//void AddFloatValue(tstring strNodeName, CSchemaFloat FloatValue);void InsertFloatValueAt(tstring strNodeName, CSchemaFloat FloatValue, int iIndex = 0); void ReplaceFloatValueAt(tstring strNodeName, CSchemaFloat FloatValue, int iIndex = 0); CSchemaFloat GetFloatValueAt(tstring strNodeName, int iIndex = 0);CSchemaFloat GetFloatValueAtCursor(xercesc::DOMNode* pCurNode);//// short ShortValue//void AddShortValue(tstring strNodeName, CSchemaShort ShortValue);void InsertShortValueAt(tstring strNodeName, CSchemaShort ShortValue, int iIndex = 0); void ReplaceShortValueAt(tstring strNodeName, CSchemaShort ShortValue, int iIndex = 0); CSchemaShort GetShortValueAt(tstring strNodeName, int iIndex = 0);CSchemaShort GetShortValueAtCursor(xercesc::DOMNode* pCurNode);//// long LongValue//void AddLongValue(tstring strNodeName, CSchemaLong LongValue);void InsertLongValueAt(tstring strNodeName, CSchemaLong LongValue, int iIndex = 0);void ReplaceLongValueAt(tstring strNodeName, CSchemaLong LongValue, int iIndex = 0); CSchemaLong GetLongValueAt(tstring strNodeName, int iIndex = 0);CSchemaLong GetLongValueAtCursor(xercesc::DOMNode* pCurNode);//// date DateValue//void AddDateValue(tstring strNodeName, CSchemaDate DateValue);void InsertDateValueAt(tstring strNodeName, CSchemaDate DateValue, int iIndex = 0);void ReplaceDateValueAt(tstring strNodeName, CSchemaDate DateValue, int iIndex = 0); CSchemaDate GetDateValueAt(tstring strNodeName, int iIndex = 0);CSchemaDate GetDateValueAtCursor(xercesc::DOMNode* pCurNode);//// time TimeValue//void AddTimeValue(tstring strNodeName, CSchemaTime TimeValue);void InsertTimeValueAt(tstring strNodeName, CSchemaTime TimeValue, int iIndex = 0);void ReplaceTimeValueAt(tstring strNodeName, CSchemaTime TimeValue, int iIndex = 0); CSchemaTime GetTimeValueAt(tstring strNodeName, int iIndex = 0);CSchemaTime GetTimeValueAtCursor(xercesc::DOMNode* pCurNode);//// dateTime DateTimeValue//void AddDateTimeValue(tstring strNodeName, CSchemaDateTime DateTimeValue);void InsertDateTimeValueAt(tstring strNodeName, CSchemaDateTime DateTimeValue, int iIndex = 0); void ReplaceDateTimeValueAt(tstring strNodeName, CSchemaDateTime DateTimeValue, int iIndex = 0); CSchemaDateTime GetDateTimeValueAt(tstring strNodeName, int iIndex = 0);CSchemaDateTime GetDateTimeValueAtCursor(xercesc::DOMNode* pCurNode);//// XMLNode XMLNodeValue//int GetCount(tstring strNodeName);bool Has(tstring strNodeName);xercesc::DOMNode* GetStartingCursor(tstring strNodeName);xercesc::DOMNode* GetAdvancedCursor(tstring strNodeName, xercesc::DOMNode* pCurNode);void RemoveAt(tstring strNodeName, int iIndex = 0);void Remove(tstring strNodeName);int GetXMLNodeCount(tstring strNodeName);bool HasXMLNode(tstring strNodeName);xercesc::DOMNode* GetStartingXMLNodeCursor(tstring strNodeName);xercesc::DOMNode* GetAdvancedXMLNodeCursor(tstring strNodeName, xercesc::DOMNode* pCurNode); void RemoveXMLNodeAt(tstring strNodeName, int iIndex = 0);void RemoveXMLNode(tstring strNodeName);void AddXMLNode(tstring strNodeName, XMLNode& theXMLNode);void InsertXMLNodeAt(tstring strNodeName, XMLNode& theXMLNode, int iIndex = 0);void ReplaceXMLNodeAt(tstring strNodeName, XMLNode& theXMLNode, int iIndex = 0);XMLNode GetXMLNodeAt(tstring strNodeName, int iIndex = 0);XMLNode GetXMLNodeValueAtCursor(xercesc::DOMNode* pCurNode);};--------------------------------------------------CPP文件--------------------------------------------------CNode::EGroupType XMLNode::GetGroupType(){return eSequence;}void XMLNode::AdjustPrefix(){//do nothing here.}/************************************************************************//* String StringValue *//************************************************************************/void XMLNode::AddStringValue(tstring strNodeName, CSchemaString StringValue){InternalAppend(Attribute, _T(""), _T(strNodeName), StringValue);}void XMLNode::InsertStringValueAt(tstring strNodeName, CSchemaString StringValue, int iIndex){InternalInsertAt(Attribute, _T(""), _T(strNodeName), iIndex, StringValue);}void XMLNode::ReplaceStringValueAt(tstring strNodeName, CSchemaString StringValue, int iIndex){InternalReplaceAt(Attribute, _T(""), _T(strNodeName), iIndex, StringValue);}CSchemaString XMLNode::GetStringValueAt(tstring strNodeName, int iIndex){return InternalGetNodeValue(Attribute, InternalGetAt(Attribute, _T(""), _T(strNodeName), iIndex)).c_str(); }CSchemaString XMLNode::GetStringValueAtCursor(xercesc::DOMNode* pCurNode){if( pCurNode == NULL )throw CXmlException(CXmlException::eError1, _T("Index out of range"));else{if(pCurNode->getNodeType() == xercesc::DOMNode::ELEMENT_NODE){xercesc::DOMNode* pChildNode = pCurNode->getFirstChild();if( !pChildNode )return _T("");if(pChildNode->getNodeValue())return XC2TS(pChildNode->getNodeValue());elsereturn _T("");}else{if( pCurNode->getNodeValue() )return XC2TS(pCurNode->getNodeValue());elsereturn _T("");}}}/************************************************************************//* Bool BoolValue *//************************************************************************/void XMLNode::AddBoolValue(tstring strNodeName, CSchemaBoolean BoolValue){InternalAppend(Attribute, _T(""), _T(strNodeName), BoolValue);}void XMLNode::InsertBoolValueAt(tstring strNodeName, CSchemaBoolean BoolValue, int iIndex){InternalInsertAt(Attribute, _T(""), _T(strNodeName), iIndex, BoolValue);}void XMLNode::ReplaceBoolValueAt(tstring strNodeName, CSchemaBoolean BoolValue, int iIndex){InternalReplaceAt(Attribute, _T(""), _T(strNodeName), iIndex, BoolValue);}CSchemaBoolean XMLNode::GetBoolValueAt(tstring strNodeName, int iIndex){return InternalGetNodeValue(Attribute, InternalGetAt(Attribute, _T(""), _T(strNodeName), iIndex)).c_str(); }CSchemaBoolean XMLNode::GetBoolValueAtCursor(xercesc::DOMNode* pCurNode){if( pCurNode == NULL )throw CXmlException(CXmlException::eError1, _T("Index out of range"));else{if(pCurNode->getNodeType() == xercesc::DOMNode::ELEMENT_NODE){xercesc::DOMNode* pChildNode = pCurNode->getFirstChild();if( !pChildNode )return _T("");if(pChildNode->getNodeValue())return XC2TS(pChildNode->getNodeValue());elsereturn _T("");}else{if( pCurNode->getNodeValue() )return XC2TS(pCurNode->getNodeValue());elsereturn _T("");}}}/************************************************************************//* Byte ByteValue *//************************************************************************/void XMLNode::AddByteValue(tstring strNodeName, CSchemaByte ByteValue){InternalAppend(Attribute, _T(""), _T(strNodeName), ByteValue);}void XMLNode::InsertByteValueAt(tstring strNodeName, CSchemaByte ByteValue, int iIndex){InternalInsertAt(Attribute, _T(""), _T(strNodeName), iIndex, ByteValue);}void XMLNode::ReplaceByteValueAt(tstring strNodeName, CSchemaByte ByteValue, int iIndex){InternalReplaceAt(Attribute, _T(""), _T(strNodeName), iIndex, ByteValue);}CSchemaByte XMLNode::GetByteValueAt(tstring strNodeName, int iIndex){return InternalGetNodeValue(Attribute, InternalGetAt(Attribute, _T(""), _T(strNodeName), iIndex)).c_str(); }CSchemaByte XMLNode::GetByteValueAtCursor(xercesc::DOMNode* pCurNode){if( pCurNode == NULL )throw CXmlException(CXmlException::eError1, _T("Index out of range"));else{if(pCurNode->getNodeType() == xercesc::DOMNode::ELEMENT_NODE){xercesc::DOMNode* pChildNode = pCurNode->getFirstChild();if( !pChildNode )return _T("");if(pChildNode->getNodeValue())return XC2TS(pChildNode->getNodeValue());elsereturn _T("");}else{if( pCurNode->getNodeValue() )return XC2TS(pCurNode->getNodeValue());elsereturn _T("");}}}/************************************************************************//* Int IntValue *//************************************************************************/void XMLNode::AddIntValue(tstring strNodeName, CSchemaInt IntValue){InternalAppend(Attribute, _T(""), _T(strNodeName), IntValue);}void XMLNode::InsertIntValueAt(tstring strNodeName, CSchemaInt IntValue, int iIndex) {InternalInsertAt(Attribute, _T(""), _T(strNodeName), iIndex, IntValue);}void XMLNode::ReplaceIntValueAt(tstring strNodeName, CSchemaInt IntValue, int iIndex) {InternalReplaceAt(Attribute, _T(""), _T(strNodeName), iIndex, IntValue);}CSchemaInt XMLNode::GetIntValueAt(tstring strNodeName, int iIndex){return InternalGetNodeValue(Attribute, InternalGetAt(Attribute, _T(""), _T(strNodeName), iIndex)).c_str(); }CSchemaInt XMLNode::GetIntValueAtCursor(xercesc::DOMNode* pCurNode){if( pCurNode == NULL )throw CXmlException(CXmlException::eError1, _T("Index out of range"));else{if(pCurNode->getNodeType() == xercesc::DOMNode::ELEMENT_NODE){xercesc::DOMNode* pChildNode = pCurNode->getFirstChild();if( !pChildNode )return _T("");if(pChildNode->getNodeValue())return XC2TS(pChildNode->getNodeValue());elsereturn _T("");}else{if( pCurNode->getNodeValue() )return XC2TS(pCurNode->getNodeValue());elsereturn _T("");}}}/************************************************************************//* Float FloatValue *//************************************************************************/void XMLNode::AddFloatValue(tstring strNodeName, CSchemaFloat FloatValue){InternalAppend(Attribute, _T(""), _T(strNodeName), FloatValue);}void XMLNode::InsertFloatValueAt(tstring strNodeName, CSchemaFloat FloatValue, int iIndex){InternalInsertAt(Attribute, _T(""), _T(strNodeName), iIndex, FloatValue);}void XMLNode::ReplaceFloatValueAt(tstring strNodeName, CSchemaFloat FloatValue, int iIndex){InternalReplaceAt(Attribute, _T(""), _T(strNodeName), iIndex, FloatValue);}CSchemaFloat XMLNode::GetFloatValueAt(tstring strNodeName, int iIndex){return InternalGetNodeValue(Attribute, InternalGetAt(Attribute, _T(""), _T(strNodeName), iIndex)).c_str(); }CSchemaFloat XMLNode::GetFloatValueAtCursor(xercesc::DOMNode* pCurNode){if( pCurNode == NULL )throw CXmlException(CXmlException::eError1, _T("Index out of range"));else{if(pCurNode->getNodeType() == xercesc::DOMNode::ELEMENT_NODE){xercesc::DOMNode* pChildNode = pCurNode->getFirstChild();if( !pChildNode )return _T("");if(pChildNode->getNodeValue())return XC2TS(pChildNode->getNodeValue());elsereturn _T("");}else{if( pCurNode->getNodeValue() )return XC2TS(pCurNode->getNodeValue());elsereturn _T("");}}}/************************************************************************//* Short ShortValue *//************************************************************************/void XMLNode::AddShortValue(tstring strNodeName, CSchemaShort ShortValue){InternalAppend(Attribute, _T(""), _T(strNodeName), ShortValue);}void XMLNode::InsertShortValueAt(tstring strNodeName, CSchemaShort ShortValue, int iIndex){InternalInsertAt(Attribute, _T(""), _T(strNodeName), iIndex, ShortValue);}void XMLNode::ReplaceShortValueAt(tstring strNodeName, CSchemaShort ShortValue, int iIndex){InternalReplaceAt(Attribute, _T(""), _T(strNodeName), iIndex, ShortValue);CSchemaShort XMLNode::GetShortValueAt(tstring strNodeName, int iIndex){return InternalGetNodeValue(Attribute, InternalGetAt(Attribute, _T(""), _T(strNodeName), iIndex)).c_str(); }CSchemaShort XMLNode::GetShortValueAtCursor(xercesc::DOMNode* pCurNode){if( pCurNode == NULL )throw CXmlException(CXmlException::eError1, _T("Index out of range"));else{if(pCurNode->getNodeType() == xercesc::DOMNode::ELEMENT_NODE){xercesc::DOMNode* pChildNode = pCurNode->getFirstChild();if( !pChildNode )return _T("");if(pChildNode->getNodeValue())return XC2TS(pChildNode->getNodeValue());elsereturn _T("");}else{if( pCurNode->getNodeValue() )return XC2TS(pCurNode->getNodeValue());elsereturn _T("");}}}/************************************************************************//* Long LongValue *//************************************************************************/void XMLNode::AddLongValue(tstring strNodeName, CSchemaLong LongValue){InternalAppend(Attribute, _T(""), _T(strNodeName), LongValue);}void XMLNode::InsertLongValueAt(tstring strNodeName, CSchemaLong LongValue, int iIndex){InternalInsertAt(Attribute, _T(""), _T(strNodeName), iIndex, LongValue);}void XMLNode::ReplaceLongValueAt(tstring strNodeName, CSchemaLong LongValue, int iIndex)InternalReplaceAt(Attribute, _T(""), _T(strNodeName), iIndex, LongValue);}CSchemaLong XMLNode::GetLongValueAt(tstring strNodeName, int iIndex){return InternalGetNodeValue(Attribute, InternalGetAt(Attribute, _T(""), _T(strNodeName), iIndex)).c_str(); }CSchemaLong XMLNode::GetLongValueAtCursor(xercesc::DOMNode* pCurNode){if( pCurNode == NULL )throw CXmlException(CXmlException::eError1, _T("Index out of range"));else{if(pCurNode->getNodeType() == xercesc::DOMNode::ELEMENT_NODE){xercesc::DOMNode* pChildNode = pCurNode->getFirstChild();if( !pChildNode )return _T("");if(pChildNode->getNodeValue())return XC2TS(pChildNode->getNodeValue());elsereturn _T("");}else{if( pCurNode->getNodeValue() )return XC2TS(pCurNode->getNodeValue());elsereturn _T("");}}}/************************************************************************//* Date DateValue *//************************************************************************/void XMLNode::AddDateValue(tstring strNodeName, CSchemaDate DateValue){InternalAppend(Attribute, _T(""), _T(strNodeName), DateValue);}void XMLNode::InsertDateValueAt(tstring strNodeName, CSchemaDate DateValue, int iIndex){InternalInsertAt(Attribute, _T(""), _T(strNodeName), iIndex, DateValue);}void XMLNode::ReplaceDateValueAt(tstring strNodeName, CSchemaDate DateValue, int iIndex){InternalReplaceAt(Attribute, _T(""), _T(strNodeName), iIndex, DateValue);}CSchemaDate XMLNode::GetDateValueAt(tstring strNodeName, int iIndex){return InternalGetNodeValue(Attribute, InternalGetAt(Attribute, _T(""), _T(strNodeName), iIndex)).c_str(); }CSchemaDate XMLNode::GetDateValueAtCursor(xercesc::DOMNode* pCurNode){if( pCurNode == NULL )throw CXmlException(CXmlException::eError1, _T("Index out of range"));else{if(pCurNode->getNodeType() == xercesc::DOMNode::ELEMENT_NODE){xercesc::DOMNode* pChildNode = pCurNode->getFirstChild();if( !pChildNode )return _T("");if(pChildNode->getNodeValue())return XC2TS(pChildNode->getNodeValue());elsereturn _T("");}else{if( pCurNode->getNodeValue() )return XC2TS(pCurNode->getNodeValue());elsereturn _T("");}}}/************************************************************************//* Time TimeValue *//************************************************************************/void XMLNode::AddTimeValue(tstring strNodeName, CSchemaTime TimeValue){InternalAppend(Attribute, _T(""), _T(strNodeName), TimeValue);}void XMLNode::InsertTimeValueAt(tstring strNodeName, CSchemaTime TimeValue, int iIndex){InternalInsertAt(Attribute, _T(""), _T(strNodeName), iIndex, TimeValue);}void XMLNode::ReplaceTimeValueAt(tstring strNodeName, CSchemaTime TimeValue, int iIndex){InternalReplaceAt(Attribute, _T(""), _T(strNodeName), iIndex, TimeValue);}CSchemaTime XMLNode::GetTimeValueAt(tstring strNodeName, int iIndex){return InternalGetNodeValue(Attribute, InternalGetAt(Attribute, _T(""), _T(strNodeName), iIndex)).c_str(); }CSchemaTime XMLNode::GetTimeValueAtCursor(xercesc::DOMNode* pCurNode){if( pCurNode == NULL )throw CXmlException(CXmlException::eError1, _T("Index out of range"));else{if(pCurNode->getNodeType() == xercesc::DOMNode::ELEMENT_NODE){xercesc::DOMNode* pChildNode = pCurNode->getFirstChild();if( !pChildNode )return _T("");if(pChildNode->getNodeValue())return XC2TS(pChildNode->getNodeValue());elsereturn _T("");}else{if( pCurNode->getNodeValue() )return XC2TS(pCurNode->getNodeValue());elsereturn _T("");}}}/************************************************************************//* DateTime DateTimeValue *//************************************************************************/void XMLNode::AddDateTimeValue(tstring strNodeName, CSchemaDateTime DateTimeValue){InternalAppend(Attribute, _T(""), _T(strNodeName), DateTimeValue);}void XMLNode::InsertDateTimeValueAt(tstring strNodeName, CSchemaDateTime DateTimeValue, int iIndex){InternalInsertAt(Attribute, _T(""), _T(strNodeName), iIndex, DateTimeValue);}void XMLNode::ReplaceDateTimeValueAt(tstring strNodeName, CSchemaDateTime DateTimeValue, int iIndex){InternalReplaceAt(Attribute, _T(""), _T(strNodeName), iIndex, DateTimeValue);}CSchemaDateTime XMLNode::GetDateTimeValueAt(tstring strNodeName, int iIndex){return InternalGetNodeValue(Attribute, InternalGetAt(Attribute, _T(""), _T(strNodeName), iIndex)).c_str(); }CSchemaDateTime XMLNode::GetDateTimeValueAtCursor(xercesc::DOMNode* pCurNode){if( pCurNode == NULL )throw CXmlException(CXmlException::eError1, _T("Index out of range"));else{if(pCurNode->getNodeType() == xercesc::DOMNode::ELEMENT_NODE){xercesc::DOMNode* pChildNode = pCurNode->getFirstChild();if( !pChildNode )return _T("");if(pChildNode->getNodeValue())return XC2TS(pChildNode->getNodeValue());elsereturn _T("");}else{if( pCurNode->getNodeValue() )return XC2TS(pCurNode->getNodeValue());elsereturn _T("");}}}/************************************************************************//* XMLNode XMLNodeValue *//************************************************************************/int XMLNode::GetCount(tstring strNodeName){return ChildCountInternal(Attribute, _T(""), strNodeName);}bool XMLNode::Has(tstring strNodeName){return InternalHasChild(Attribute, _T(""), strNodeName);}xercesc::DOMNode* XMLNode::GetStartingCursor(tstring strNodeName){return InternalGetFirstChild(Attribute, _T(""), strNodeName);}xercesc::DOMNode* XMLNode::GetAdvancedCursor(tstring strNodeName, xercesc::DOMNode* pCurNode){return InternalGetNextChild(Element, _T(""), _T(strNodeName), pCurNode);}void XMLNode::RemoveAt(tstring strNodeName, int iIndex){InternalRemoveAt(Attribute, _T(""), _T(strNodeName), iIndex);}void XMLNode::Remove(tstring strNodeName){while (Has(strNodeName))RemoveAt(strNodeName, 0);}int XMLNode::GetXMLNodeCount(tstring strNodeName){return ChildCountInternal(Element, _T(""), strNodeName);}bool XMLNode::HasXMLNode(tstring strNodeName){return InternalHasChild(Element, _T(""), strNodeName);}xercesc::DOMNode* XMLNode::GetStartingXMLNodeCursor(tstring strNodeName){return InternalGetFirstChild(Element, _T(""), strNodeName);}xercesc::DOMNode* XMLNode::GetAdvancedXMLNodeCursor(tstring strNodeName, xercesc::DOMNode* pCurNode) {return InternalGetNextChild(Element, _T(""), _T(strNodeName), pCurNode);}void XMLNode::RemoveXMLNodeAt(tstring strNodeName,int iIndex){InternalRemoveAt(Element, _T(""), _T(strNodeName), iIndex);}void XMLNode::RemoveXMLNode(tstring strNodeName){while (HasXMLNode(strNodeName))RemoveXMLNodeAt(strNodeName, 0);}void XMLNode::AddXMLNode(tstring strNodeName, XMLNode& theXMLNode){InternalAppendNode(_T(""), _T(strNodeName), theXMLNode);}void XMLNode::InsertXMLNodeAt(tstring strNodeName, XMLNode& theXMLNode, int iIndex) {InternalInsertNodeAt(_T(""), _T(strNodeName), iIndex, theXMLNode);}void XMLNode::ReplaceXMLNodeAt(tstring strNodeName, XMLNode& theXMLNode, int iIndex) {InternalReplaceNodeAt(_T(""), _T(strNodeName), iIndex, theXMLNode);}XMLNode XMLNode::GetXMLNodeAt(tstring strNodeName, int iIndex){return XMLNode(InternalGetAt(Element, _T(""), _T(strNodeName), iIndex));}XMLNode XMLNode::GetXMLNodeValueAtCursor(xercesc::DOMNode* pCurNode){if( pCurNode == NULL )throw CXmlException(CXmlException::eError1, _T("Index out of range"));else{return XMLNode(pCurNode);}}。
Java XML处理解析和生成XML数据
Java XML处理解析和生成XML数据Java作为一种广泛使用的编程语言,提供了丰富的API和工具来处理和操作XML数据。
本文将介绍Java中处理和解析XML数据的基本方法,并探讨如何使用Java生成XML数据。
一、XML简介XML(可扩展标记语言)是一种用于描述数据的标记语言,它的设计目标是传输数据而不仅仅是显示数据。
XML以一种结构化的方式存储数据,使得数据具有良好的可读性和可扩展性。
二、XML解析XML解析是指将XML数据转换为Java程序可以理解和处理的格式。
Java提供了几种XML解析方法,包括DOM(文档对象模型)、SAX(简单API for XML)和StAX(流API for XML)。
1. DOM解析DOM解析是最常用和最常见的XML解析方法之一。
DOM将XML文档视为一个树形结构,通过解析整个文档并将其加载到内存中,以方便对数据的操作和访问。
使用DOM解析XML的基本步骤如下:(1)创建一个DocumentBuilder对象。
(2)使用DocumentBuilder对象的parse()方法解析XML文件,返回一个Document对象。
(3)通过Document对象获取XML文件中的节点和元素。
以下是一个使用DOM解析XML的示例代码:```DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();DocumentBuilder builder = factory.newDocumentBuilder();Document document = builder.parse(new File("example.xml"));// 获取根节点Element rootElement = document.getDocumentElement();// 获取子节点NodeList nodeList = rootElement.getElementsByTagName("book");for (int i = 0; i < nodeList.getLength(); i++) {Element bookElement = (Element) nodeList.item(i);String title =bookElement.getElementsByTagName("title").item(0).getTextContent( );String author =bookElement.getElementsByTagName("author").item(0).getTextContent();System.out.println("Title: " + title + ", Author: " + author);}```2. SAX解析SAX解析是一种基于事件驱动的解析方法。
一种基于自然语言生成的XML关键字查询技术
Y N Qu y h X A S ixo g A i— a , I h— in
、 ,
中国矿业大学 计算机科学与技术学院 , 苏 徐州 2 1 1 江 216
S h o fCo ue ce c n e h ooy, hn iest fMiig a d T c n lg Xuh n,in s 2 1 6, hn c o lo mp tr S in e a d T c n lg C ia Unv ri o nn n e h oo y, z o Ja gu 2 1 1 C ia y
a p i d t p l o XML d c me t a d c n e u nl s a e ol cin s e u e . i h s r f t r g t e me s g ol ci n, e e e o u n n o s q e t me s g c l t i y e o d d c dW t t e u e i e i h s a e c l t h l n e o t s ~ h ma t e r h i c iv d n h f c e c s i r v d g e t . n l s ac s a h e e a d t e ef in y i mp o e r a l e i y
维普资讯
10 2 0 ,4 2 ) 5 0 84 (6
FastMatch:一种高效的XML关键字查询算法
21 0 2年 6月
计 算 机 应 用 研 究
Ap l a in R s a c fCo u e s p i t e e r h o mp t r c o
Vo . 9 No 6 12 .
Jn 0 2 u .2 1
Hale Waihona Puke F sMac : a t th 一种 高效 的 X 键 字查 询 算 法 术 ML关
m oe t a nc r h n o e,S ti n f c e n r cie. T ov h sp o l m ,t i a e r po e t d u ed fs r up t e c O i si e inti p a tc i o s l et i r b e h sp p rp o s d a meho s a tg o or du e t e tme fs a n h n et d lss.te r p s d aag rt m a d Fa t th b s d o h eh d. i lo ih f u h i s o c nig t e i v re it h n p o o e lo ih n me sMa c a e n t e m t o Th sag rt m o nd als te es lsm e tng s me c ran c nd to y s a n n l n de n t e i v re it nl nc Th x e i n a e l ub r e r ut e i o et i o iinsb c n i g al o s i h n e d lss o y o e. t e e p rme tlr — s lsv rf he h g e fr a e o hi t d. ut e iy t ih p ro m nc ft s meho Ke r y wo ds: XM L; k y r e r h; e ce t fs o p;Fa t ac e wo d s a c i f in ; a tg u r sM th
基于同义词规则的高效XML关键字查询
基于同义词规则的高效XML关键字查询张林林,陆嘉恒中国人民大学数据工程与知识工程教育部重点实验室,北京100872摘要:关键字检索是一种简洁友好的查询方式,它不需要用户了解复杂的XML查询语言及XML文档数据结构。
然而,现有的方法往往局限于用户输入的关键字,而不能有效地挖掘用户查询语义。
本篇文章提出了一个基于SLCA查询语义的高效查询算法:TM-IL,它使用同义词、近义词、缩略词等规则来对用户的查询进行改写,从而能有效提高查询质量。
该方法并不依赖于特殊的查询语义,因此该方法可以广泛应用于其它查询技术。
关键词:计算机软件,XML关键词检索,同义词规则,SLCA中图分类号:TP311.5Effective Keyword Search with Synonym Rulesover XML DocumentZhang Linlin,Lu JiahengKey Laboratory of Data Engineering and Knowledge Engineering,Renmin University ofChina,Beijing100872Abstract:Keyword search is a friendly way for user tofind the information they are interested in from XML documents without having to learn a complex query language or needing prior knowledge of the structure of the underlying data.However,the existing methods are usually limited to the input keywords.In this paper,we introduced the notion of synonyms,acronym,abbreviations and so on to capture user query intentions.We propose a SLCA based keyword search with synonym rules over xml documents which are orthogonal to various of xml keyword search techniques.In addition,we also use this to give a effective and efficient slca based keyword search.Key words:Computer Software,XML Keyword Search,Synonym Rule,SLCA0IntroductionIt is becoming increasingly popular to publish data on the Web in the form of XML documents.Keyword search is well-suited to XML trees.It allows users tofind the information they are interested in without having to learn a complex query language or needing prior基金项目:本文受“高等学校博士点学科专项基金资助(编号:20090004120002)作者简介:Zhang Linlin(1987-),male,master,major research direction:Big data management and Cloud computing, XML data management.Correspondence author:Lu Jiaheng(1975-),male,professor,major research direction:Big data management and Cloud computing,XML data management,String Similarity measurement.knowledge of the structure of the underlying data [1–6].XML keyword search enforces a conjunctive search semantics (i.e.all the keywords should be covered in each query result),such as LCA (Lowest Common Ancestor)[5]and its variants [1,2,4].Among those proposals,SLCA(Smallest LCA)is widely adopted [2],where each SLCA result contains all query keywords but has no descendant whose subtree also contains all keywords.Unfortunately,all of them assume each keyword in the query is intended as part of it.However,a user query may often be an imperfect description of their real information need.Even when the information need is well described,a search engine may not be able to return the results matching user’s query intention as illustrated by the following example.Example 1.Consider a query Q =“Jennie paper”issued on a bibliographic document in Figure 1which is modeled using the conventional labeled tree model.The query most likely intend to find all papers written by Jennie.According to SLCA semantics defined in [1],it will return the most specific relevant answers -the subtrees rooted at nodes 0.1.2and 0.3.1.However,terms inproceedings and article are synonyms of paper.Therefore,we should also take nodes 0.1.1,0.1.3and 0.3.2into consideration to predict user’s intention.bib 0conference 0.3conference 0.1name 0.3.0name 0.1.0paper 0.3.1title 0.3.1.0author 0.3.1.1data base 0.3.1.0.0Jennie 0.3.1.1.0inproceedings 0.1.1title 0.1.1.0author 0.1.1.1data base 0.1.1.0.0Jenny 0.1.1.1.0article 0.1.3paper 0.1.2database 0.3.0.0title 0.1.2.0author 0.1.2.1datum base 0.1.2.0.0Jennie 0.1.2.1.0title 0.1.3.0author 0.1.3.1database 0.1.3.0.0Jennie 0.1.3.1.00.1.0.0administrator 0.0Jenny 0.0.0workshop 0.2chair 0.2.00.2.0.0inproceedings 0.3.2title 0.3.2.0author 0.3.2.1base 0.3.2.0.0Jennie 0.3.2.1.0abstract 0.1.2.2base 0.1.2.2.0abstract 0.3.2.2data 0.3.2.2.0图1:bib.xmlAccording to the phenomenon illustrated in Example 1,we should take advantage of the equivalence between strings to improve the quality of keyword search over xml documents.There are many cases where strings that are syntactically far apart can still represent the same real world object.This happens in a variety of conditions such as synonyms,acronyms,abbreviations and so on.For instance,“bike”is a synonym of “bicycle”,“ICDE”is an acronym of “International Conference on Data Engineering”while “DB”is an abbreviation of “database”.We just use “synonym”as a placeholder for any kinds of equivalence expressions.In this paper,we assume there exists a collection of predefined synonym rules.Such rules can be obtained from users’existing dictionaries,explicit inputs,by data and text mining [7]or query log analysis [8].For ease of presentation,we will describe the rules without instantiating with any particular types in this paper.The following example shows how to use these rulesto rewrite input queries thereby capturing user’s potential purpose.Example2.Consider the query in Example1,the answer should be[0.1.2,0.3.1]according to the semantic of SLCA.Given a set of synonyms which are of the form paper→inproceedings and paper→rmally,thefirst rule means that an occurrence of paper can be replaced by inproceedings.Therefore,the query can be expanded to three equivalent strings-{“Jennie paper”,“Jennie inproceedings”,“Jennie article”}.Thus,the answers should be[0.1.2, 0.1.3,0.3.1,0.3.2].In this paper,we study how to design a effective framework with synonym rules to not only support XML keyword search but also improve the quality of existing methods.we take SLCA as the underline keyword search semantic but our framework is orthogonal to other xml keyword search methods.While the synonym rules is expressive and powerful,it also introduces many new challenges.Our major contributions towards xml keyword search with synonym rules are summarized as follows:•We introduce a effective SLCA based keyword search over xml documents with synonym rules to capture user’s potential query intentions.•We give a formal definition of expansion rules.Meanwhile,we also explain how to rewritea query based on given synonym rule sets and deeply analyze all the possible transforma-tions.•we theoretically discuss how to use this framework to support SLCA based xml keyword search and give some optimization techniques.we also design a Transformation Matching based IL Algorithm(TM-IL algorithm)which could efficiently and effectively return SLCA of input keywords.The rest of the paper is organized as follows.In Section2,we formally define synonym rules and show how to use these rules to rewrite user’s queries.Section3and Section4present how to enhance the quality of SLCA based XML keyword search which show that our expansion based framework is orthogonal to the choice of the underlying XML keyword search technique which are not limited to SLCA or it’s variations but also related ranking strategies.In section 5,we give a briefly conclusion and show the future of our work.1Query RewritingThis section presents how we get and use rules to expand our queries.As we know,user input query is a string which can be modeled as a ordered sequence of grams where each grams is a(smaller)string.We can easily convert a string into a ordered gram sequence by splitting itbased on delimiters(white spaces)or regular expression.i.e.the string“International Conf on Data Eng”can be converted to ordered sequence of grams<International,Conf,on,Data,Eng>.1.1Synonym RulesA synonym rule consists of a pair of strings which is of the form lhs→rhs denoted by r which means they are equivalent with bias.Each of lhs and rhs is a string and could not be empty.Example3.Some example of synonym rules•W illiam→Bill•bike→bicycle•V LDB→V eryLargeDataBasesHere we use“→”to refer to the relevance between lhs and rhs which means lhs could be replaced by rhs.however,the inverse is not true due to abbreviations and acronyms might lose information compared with original strings.There are multi-ways to obtain synonym sets.First,we can easily extract synonyms from existing data including various domains such as biology1,address postal service(e.g.,USPS2), academic publications(e.g.,computer science3and medicine4)and so on.Second,synonyms can be obtained by applying data mining methods on datasets to get equivalence classes which represent the same real-world objects.Obviously,this strategy is domain independent,but require more human intervention to get more accurate results.Third,synonyms can be pro-grammatically generated.For example,we can generate rules that connect the integer and textual representation of numbers such as36th and Thirty-Sixth.1.2Query RewritingWe now describe how to rewrite queries when given a set of synonym rules R.The following is an example that illustrate the procedure of expanding one string to another according to synonym rules.Example4.Considering the query Q=“k1k2k3”and synonym rule set R={r1:“k3”→“k4”, r2:“k2k3”→“k2k5”,r3:“k1k2”→“k6k7”}.This means“k3”,“k2k3”and“k1k2”can be replaced by“k4”,“k2k5”and“k6k7”respectively.Consequently,we can get extended strings as follows:1http://www.expasy.ch/sprot2United States Postal Service,.3/CFP.aspx4/•“k1k2k3”r1−→“k1k2k4”•“k1k2k4”r3−→“k6k7k4”•“k1k2k3”r2−→“k1k2k5”•“k1k2k3”r3−→“k6k7k3”Finally,Q will be converted to{“k1k2k3”,“k1k2k4”,“k6k7k4”,“k1k2k5”,“k6k7k3”}, where underline denotes matching transformations.We can see from example4that“k1k2k5”can not be transformed to“k6k7k5”although “k1k2”can be replaced by“k6k7”(r3).Here we only allow1-level transformation which means a gram which is generated as a result of substitution is prohibited participating in a subsequent transformation.It’s easily to see there might be the case that the string transformation would never stop if we don’t restrict the transformation level.Even we restrict the transformation level to1,it is still NP-Complete to determine whether one string could be transformed to another string by applying rules.2Keyword search with synonym rulesIn this section,we review existing related proposals about keyword search and SLCA.Then we will describe our methods based on these notations.2.1NotationWe model XML docuemnt as a tree using conventional labeled ordered tree model.Nodes in the tree corresponds to an XML element and is labeled with a tag denoted byλ(n).Given two nodes u and v,u≺v denotes u is ancestor of v and u≼v denotes u≺v or u=v.We assign to each node a numerical id pre(v)that is compatible with preorder numbering,in the sense that if a node v1precedes a node v2in the preorder left-to-right depth-first traversal of the tree then pre(v1)<pre(v2).The usual<relationship is also compatible with Dewey numbers[1]. For example,0.1.0.0.0<0.1.1.1.We begin by formally introducing the concepts of Lowest Common Ancestor(LCA)and Smallest Lowest Common Ancestor(SLCA).Definition1(LCA).Given m nodes n1,n2,...n m,u is called LCA of nodes n1,n2,...n m iffu is ancestor of each node n i for1≤i≤m and node v,u≺v that v is also ancestor of each node n i.This can be denoted as u=lca(n1,n2,...n m).Given sets of nodes S1,S2,...S m,the LCA of sets S1,S2,...S m is the set of LCA for each combination of nodes in S1through S m which can be denoted as theflowing expression:lca(S1,S2,...S m)={u|u∈lca(n1,n2,...n m)|n1∈S1,n2∈S2,...n m∈S m}Definition2(SLCA).Given a set of nodes S1,S2,...S m,u is called SLCA of S1,S2,...S m, iffu∈lca(S1,S2,...S m)and∀v∈lca(S1,S2,...S m),u⊀v.This can be denoted as slca(S1,S2,...S m)={u|u∈lca(S1,S2,...S m)∧v∈lca(S1,S2,...S m),u⊀v}.Given a set of nodes S,the basic idea of SLCA is that it will return the closest ancestor node which doesn’t contain any descendent node that is also the ancestor of each node in S. Given two nodes v1and v2and their Dewey number dw1and dw2,lca(v1,v2)is the node with Dewey number that is the longest common common prefix of dw1and dw2.For example,the LCA of nodes xx and xx is the node xx in Figure1.It’s easily to get that slca(S1,S2,...S m) =RemoveAncestor(lca(S1,S2,...S m)).Given a query Q which contains a list of m keywords k1,k2,...k m(for ease of presentation,we don’t make any distinguish between query(k1,k2,...k m)and query(“k1k2...k m”)with an input XML document D),the answers of slca(k1,k2,...k m)are the result nodes of slca(S1,S2,...S m), where S i denotes the sorted keyword list of k i for1≤i≤m,i.e.,the list of nodes whose label directly contains k i sorted by id.2.2SLCA based Keyword Search with Synonym RulesGiven a query Q={k1,k2,...k m},the Indexed Lookup Eager Algorithm(IL)[1]first get the sorted inverted list S i for each k i,and then sort the inverted list according to thesize of S i to make sure S1is the list with smallest size.IL algorithm compute their slca by slca(slca(S1,...S m−1),S m).When computing any slca(S i,S j)for1≤i<j≤m and i=j−1,the algorithm efficiently removing the ancestor nodes according to the following two lemmas.Lemma1.For any two nodes v i,v j and s set S,if pre(v i)<pre(v j)and pre(slca(v i,S))> pre(slca(v j,S))then slca(v i,S)≺slca(v j,S)Lemma2.Given any two nodes v i,v j and s set S such that pre(v i)<pre(v j)and pre(slca(v i,S)) <pre(slca(v j,S)),if slca(v i,S)is not an ancestor of slca(v j,S),then for any v such that pre(v) >pre(v j),slca(v,S)⊀slca(v j,S)Given a set of synonym rules R and a query Q,we denoted the slca as slca(Q,R).When introducing the concept of synonym rules,we shouldfirst get all the possible strings that could be produced after applying rules.Then we can compute the query result for each generated string.we can get thefinal answer by removing the ancestors of these query results.This can be formally defined by the following expression where transform(Q,R)denotes all the strings generated by applying rules.slca(Q,R)={removeAncestor(slca(Q1),...Q i)...slca(Q k)),Q i∈transform(Q,R),1≤i≤k}slcara Jennie database slcadata base(a)slca(Q2,Q2)slcara Jenniedatabase slcadata base(b)slca(Q2,Q2)slcara Jenniera basedata datum(c)slca(Q2,Q2)slcara slcaraJennieslca databasedata baseJennydata(d)slca(Q2,Q2)图2:Possible TransformationNext,we will give a deeply analysis for each possible transformation.Wefirst give the sorted inverted list for the keywords used later as shown in Table1.表1:part of sorted inverted keyword list in Figure1Jennie S1=[0.1.2.1.0,0.1.3.1.0,0.2.0.0,0.3.1.1.0]database S2=[0.1.0.0,0.1.3.0.0,0.3.0.0]data S3=[0.1.1.0.0,0.3.1.0.0,0.3.2.1.0]base S4=[0.1.1.0.0,0.1.2.0.0,0.1.2.2.0,0.3.1.0.0,0.3.2.0.0]datum S5=[0.1.2.0.0]Jenny S6=[0.0.0,0.1.1.1.0]Condition1,Split:Given a query Q=“database Jennie”in Figure1and a set of synonym rules R=“database”→“data base”.After transformation,the query will be expanded to Q1=“database Jennie”and Q2=“data base Jennie”.Therefore,we shouldfirst compute slca(Q1)and slca(Q2)and then remove the ancestor nodes to get thefinal result.As the keyword list for“data”(S3),“base”(S4),“database”(S2)and“Jennie”(S1)are listed in Table1. For slca(Q2),we shouldfirst compute slca(S3,S4)and then compute their slca with S1.The procedure can be expressed as slca(Q,R)=ra(slca(S2,S1),slca(S3,S4,S1))where ra()denotes remove ancestor operation.As we know“database”and“data base”represents the same object,then the expression can be denoted as slca(Q,R)=slca(ra(slca(S3,S4),S2),S1)which could be modeled as a tree as shown in Figure3(a).The optimize technique here we used is to execute ra operation as soon as possible due to slca is much more time-consuming compared with ra.Some early pruned nodes(by ra)would have no chance to participate in the next slca operation.Then,we can get thefinal result by slca(Q,R)=slca(ra([0.1.1.0.0,0.3.1.0.0, 0.3.2],[0.1.0.0,0.1.3.0.0,0.3.0.0]),S1)=slca([0.1.1.0.0,0.3.1.0.0,0.3.2,0.1.0.0,0.1.3.0.0,0.3.0.0], [0.1.2.1.0,0.1.3.1.0,0.2.0.0.0,0.3.1.1.0,0.3.2.1.0])=[0.1.3,0.3.1,0.3.2].Condition2,Merge:Considering a query Q=“data base Jennie”in Figure1and a set of synonym rules R=“data base”→“database”.we will get new strings Q1=“data baseJennie”and Q2=“database Jennie”after transformation.We can see the procedure is very similar to Condition1.slca(Q,R)=RemoveAncestor(Q1,Q2)=[0.1.3,0.3.1,0.3.2].The corresponding tree model is shown in Figure3(b).Condition3,Mix:For query Q=“data base Jennie”in Figure1and a set of synonym rules R=“data base”→“database”,“data”→“datum”.The new generated queries will be Q1=“data base Jennie”,Q2=“database Jennie”and Q3=“datum base Jennie”.According to the xml tree model as shown in Figure3(c),we can easily get the expression of the procedure slca(Q,R)=slca(Q1,Q2,Q3).Thus the result will be slca(Q,R)=[0.1.2,0.1.3,0.3.1,0.3.2].Condition4,Overlap:Given a qeury Q=“data base Jennie”in Figure1and a set of synonym rules R=“data base”→“database”,“base Jennie”→“Jenny”.The new queries will be Q1=“data base Jennie”,Q1=“database Jennie”and Q1=“data Jenny”.The result will be[0.1.1,0.1.3,0.3.1,0.3.2](Figure3(d)).We have listed all the possible conditions we might meet when introducing the synonym rule semantics.Split means the one substring are expanded to more strings according to the size.Similarly,Merge refers to the inverse process.While,Mix and Overlap refers to the condition that Split and Merge come up with the same substring.For example,“data base”could be expanded to“data base”,“database”and“datum base”according to R= {‘‘database′′→‘‘database′′,‘‘data′′→‘‘datum′′}.However,the overhead of computing slca based on synonym rules is much too expensive.The cost will be in exponential scale to the size of useful synonym rules.3OptimizationsIn this section,we will give some optimization techniques to speed up the procedure of computing slcas when introducing synonym rules.Given a string s and a set of synonym rules R,the goal is tofind all possible matching transformations.The observation is that if lhs is a substring of s,then lhs is a prefix of some suffix of s.We can construct a trie over all the distinct lhs in T and then process every string in lhs of R.After that,for a given string,use each of its suffixes to look up the trie,i.e.we need to traversal each input string from rear to head in word level.For each substring we then need to scan from head to rear to check whether there is a matching for the sub of this substring. The details are straightforward and we defer them to the full version of the paper.Traditional strategiesfirst get all sorted inverted list for each keyword occurs in the XML Document and then maintain a B+structure to speed up the look up option as shown in Figure 3(a).Differently,apart from the B+structure we also store the slca of the right-side of each rule.i.e.,given a rule“lhs”→“rhs”,we pre-compute slca(“rhs”)and then store the result into the trie which is called synonym trie as shown in Figure3(b).Note that,we only store slca of string with non-empty result.The benefit is that when getting a transformation,we also getadministratorarticle abstract author bib chair base conference databaseinproceedings data Jennie name paper Jenny title data Jenny base 0.1.2.20.3.2.20.00.1.30.1.1.10.1.2.10.1.3.10.3.1.10.3.2.10.1.1.0.00.1.2.0.00.1.2.2.00.3.1.0.00.3.2.0.000.2.00.10.30.1.1.0.00.1.2.0.00.3.1.0.00.3.2.2.00.1.0.00.1.3.0.00.3.0.00.1.10.3.20.0.00.1.1.1.00.2.0.00.1.2.1.00.1.3.1.00.3.1.1.00.3.2.1.00.1.00.3.00.1.20.3.10.1.1.00.1.2.00.1.3.00.3.1.00.3.2.0(a)B+Tree from the data of Figure 1lhs(b)Synonym Trie图3:Index Structurethe slca of “rhs”.3.1Transformation Matching based IL AlgorithmNext we will explain how to use this synonym trie to speed up our query procedure.Here we combine transformation matching operation and IL algorithm together denoted as Transformation Matching based Indexed Lookup Eager Algorithm(TM-IL).The algorithm is based on transformation matching operations.As shown in Algorithm 1,♯6-10denotes the IL Algorithm,while ♯11-16denotes the matching transformations.when there is a matching,we need to query the slca result of this matching substring(♯11,TMIL)and then pass the query result to next iteration(♯15,TMIL).However,if the query result is null,this might happen due to i)no matching,ii)no keywords in the xml document for the query,and iii)null slca for the matching keywords,the algorithm will goes into next loop.Finally,if we reach to the head of input string s ,the algorithm will call computeSlca(canList)to compute and return slca of candidate inverted lists(♯20,TMIL).For example,we consider again the query used in Section 2.2,Condition 4applied on the data of Figure 1.“base Jennie”will be firstly detected(♯7,TMIL),then we will get slca(“Jenny”)=S 1as shown in Figure 1which will be added to the candidate list canList and passed to next iteration(♯10,TMIL).For the next iteration,s=“data”,canList={S 6}and rList=empty,it will get inverted list S 3for “data”at ♯7,and compute slca(S 3,S 6)at ♯8,then pass the result to next iteration(♯9).The result for “data Jennie”will be add to result list rList at ♯20for the next iteration.After all the return operations,the final result will be combined together(♯4,TMILCall).Algorithm1TM-IL AlgorithmProcedure TMILCall(s)1:result=empty;//receivefinal results2:rList=empty;//store intermediate keyword lists3:canList=empty;//store candidate keyword lists4:return result.addAll(TMIL(s,canList,rList));//combine the result and remove ancesotrsProcedure TMIL(s,canList,rList)1:if s!=null then2:for(i=|s|−1;i≥0;i−−)do3:s1=s.substring(i,|s|);4:for(j=1;j≥|s1|;j++)do5:s2=s1.substring(0,j);6:if(j==1)then7:queryResult=queryBPTree(s2);8:canList=slca(queryResult,canList)9:rList.addAll(TMIL(s.substring(0,i),canList,rList))10:end if11:qeuryResult=qeurySynonymTrie(s2);//query slca(s2)from synonym trie.12:if(qeuryResult!=null)then13:declare canList’and add queryResult to this list.14:canList’.add(IL(s1.substring(j,|s1|)));//IL denotes Index Lookup Eager Algorithm 15:rList.addAll(TMIL(s.substring(0,i),canList’,rList))16:end if17:end for18:end for19:else20:return computeSlca(canList);21:end ifProcedure computeSlca(canList)1:for each list in canList,compute their slca by IL();4Conclusion and Future WorkWe propose in this paper a SLCA based keyword search with synonym rules over xmldocuments.We formally defines the semantics of synonym rules and analyze the possiblematching transformations.In addition,we also apply this method to effectively and efficiently˖ compute slcas over XML Documents.To the best of our knowledge,this is thefirst work that takes into account of synonym rules for xml keyword search.Our theoretical analysis shows that our method is orthogonal to other xml keyword search techniques.There are several avenues for future work.For instance,we can use synonyms to improve the quality of ranking based xml keyword search,such as XSearch[4],XRank[5]and so on.参考文献(References)[1]Yu Xu and Yannis Papakonstantinou.Efficient keyword search for smallest lcas in xmldatabases.In SIGMOD Conference,pages537–538,2005.[2]Yu Xu and Yannis Papakonstantinou.Efficient lca based keyword search in xml data.InCIKM,pages1007–1010,2007.[3]Zhifeng Bao,Tok Wang Ling,Bo Chen,and Jiaheng Lu.Effective xml keyword search withrelevance oriented ranking.In ICDE,pages517–528,2009.[4]Sara Cohen,Jonathan Mamou,Yaron Kanza,and Yehoshua Sagiv.Xsearch:A semanticsearch engine for xml.In VLDB,pages45–56,2003.[5]Lin Guo,Feng Shao,Chavdar Botev,and Jayavel Shanmugasundaram.Xrank:Rankedkeyword search over xml documents.In SIGMOD Conference,pages16–27,2003.[6]Guoliang Li,Jianhua Feng,Jianyong Wang,and Lizhu Zhou.Effective keyword search forvaluable lcas over xml documents.In CIKM,pages31–40,2007.[7]Yoshimasa Tsuruoka,John McNaught,Jun ichi Tsujii,and Sophia Ananiadou.Learningstring similarity measures for gene/protein name dictionary look-up using logistic regression.Bioinformatics,23(20):2768–2774,2007.[8]G.Craig Murray and Jaime Teevan.Query log analysis:social and technological challenges.SIGIR Forum,41(2):112–120,2007.-11-。
软件开发中的文档自动生成技术
软件开发中的文档自动生成技术随着软件开发的快速发展,文档自动生成技术也越来越重要。
反复的手工编写文档不仅费时费力,而且容易出错。
自动生成文档不仅能够提高开发效率,而且还能够保证文档的一致性和准确性。
下面我们来看看现在最流行的几种文档自动生成技术。
一、XML文档自动生成技术XML技术具有很强的扩展性和可移植性,很适合用于文档自动生成。
目前,很多软件采用了XML技术,通过自动化脚本或者工具来生成文档。
XML技术的优势在于其分离数据和呈现的特点。
开发人员可以将数据存储在XML中,用样式表或者模板来呈现出文档。
这样一来,当数据发生变化时,只需要更新XML中的数据即可,文档的呈现形式就不需要再做修改。
二、Markdown文档自动生成技术Markdown是一种轻量级的标记语言,非常适合用于写技术文档。
使用Markdown编写的文档可以方便地转化为HTML、PDF、Word等格式。
目前,很多文本编辑器都支持Markdown语法。
Markdown语法非常简单易懂,甚至没有HTML复杂。
通过使用Markdown编辑器,代码规范可读,提高了工作效率。
三、Swagger文档自动生成技术Swagger是一个流行的API文档自动生成工具,适用于以Restful架构进行API设计的企业。
通过Swagger,开发人员可以定义API的各种参数,方便地生成API的文档,还可以在线测试API的调用。
Swagger支持多种语言,如Java、Python、Ruby等。
它将API 标准化,使得开发者可以更加关注于架构本身的设计,加快了API开发的速度。
四、Doxygen文档自动生成技术Doxygen是一款常用的文档自动生成工具,它能够根据代码中注释自动生成文档。
它支持多种语言,如C++、Java、Python等。
开发人员只需要在代码中添加注释,即可快速生成文档。
Doxygen的输出格式非常丰富,可以生成HTML、LaTeX、RTF、XML等格式的文档。
安全访问控制的XML关键字检索
0.1.0.1.1.0
0.1.3.0.3.0
Fig.1 XML document
图 1 XML 文档
李晓东 等:安全访问控制的 XML 关键字检索
75
通过文献[2]中的 SLCA 节点的定义可知,使用 SLCA 的方法可以从图 1 中得到四个 SLCA 节点(如图 1 中 用 黑 色 标 记 的 节 点):File(0 . 0 . 0 . 0)、File(0 . 1 . 0 . 0)、 File(0.1.0.1)和 Staf(f 0.1.3.0)。但显然不是所有的以 这四个节点为根的子树都满足安全访问约束。比如由 于部门号是“#0001”,文件 File(0.0.0.0)包含的所有 信息都不应该可见。通过下面的方法来保证返回给用 户的 SLCA 结果是安全的。
Company *
Dept
Files DeptName DeptNo * File
*
Staffs
* Staff
FileName Grade Signatory StaffName Age Salary Interest Grade
Fig.2 XML Schema 图 2 XML 模式
首先在 XML 文档 Schema 的每条边上加上安全 说明,安全说明分为三种标识:(1)可见;(2)不可见; (3)条件可见。每一种标识实际上对应了一条 Schema 路径所指向的节点,标明了对该节点的访问权限。基 于这种加了标识的 Schema,为 XML 文档生成一个特 殊的索引。每当用户提交一组关键字,利用文献[2]中 的 SLCA 生成算法,得到一个符合关键字查询的 SLCA 节点的集合 L。接着对集合 L 中的每个 SLCA 节点判 断其是否包含了不可访问节点,如果包含,则将这些 节点以及所有后代节点从以 SLCA 为根的最小应答 子树中裁剪掉,之后,观察最小应答子树的剩余部分, 判断是否包含所有关键字,如果不包含把这个 SLCA 节点从集合 L 中删除。否则,使用自底向上(bottomup)的方法,在 Schema 中从 SLCA 节点往上遍历,找 到当前节点到根节点的路径;然后利用索引,使用自 顶向下(top-down)的方法,从根节点沿着路径往下遍 历,判断所遇到的每个节点(包括 SLCA 的所有祖先
XML关键词检索的摘要生成方法[发明专利]
专利名称:XML关键词检索的摘要生成方法专利类型:发明专利
发明人:邓志鸿,江家健
申请号:CN201010614955.8
申请日:20101230
公开号:CN102004802A
公开日:
20110406
专利内容由知识产权出版社提供
摘要:本发明提供了一种XML关键词检索的摘要生成方法以及一种评判摘要重要程度的模型。
该模型包含三个评价要素:区分性,明确性,关联性。
区分性用于衡量属性a的区分性强弱,明确性用于衡量属性a对于查询Q的明确性,关联性用于衡量属性a与实体e间的关联性。
本发明提供的方法利用该模型对XML关键词检索的摘要的重要性进行定量分析,计算公式为W(e,a)=
(Dist(a)·Expl(a,Q)),然后选取最重要的top-K个属性作为描述实体的摘要,解决了传统XML关键词检索缺少对信息重要性的定量衡量的问题。
申请人:北京大学
地址:100871 北京市海淀区颐和园路5号
国籍:CN
代理机构:北京万象新悦知识产权代理事务所(普通合伙)
代理人:贾晓玲
更多信息请下载全文后查看。
Oracle之xml的增删改查操作
Oracle之xml的增删改查操作工作之余,总结一下xml操作的一些方法和心得!tip: xmltype函数是将clob字段转成xmltype类型的函数,若字段本身为xmltype类型则不需要引用xmltype()函数同名标签用数组取值的方式获取,但起始值从1开始一.查询(Query)1. extract函数,查询节点值,带节点名1-- 获取带节点的值,例如:<tel>222</tel>2selectextract(xmltype(e.prj_xml),'/data/project/persons/person[1]/tel').getStringV al() as title from project e where e.zh_title='白夜追逐繁星';3-- 备注如果节点表达式查询一个节点的父节点,则会将该父节点下的所有节点包含该父节点查出Query Result:tip: extract函数中路径引用text(),查询的节点若重复则自动拼接select extractvalue(xmltype('<a><b>1</b><b>2</b></a>'),'/a/b') from dual; -- 报错,报只返回一个节点值,因为在a标签下存在两个同名标签b selectextract(xmltype('<a><b>1</b><b>2</b></a>'),'/a/b/text()') from dual; -- extract+text() 解决同名节点问题,若存在重复节点会自动拼接在一起,但不使用任何拼接符号2. extractvalue函数,查询节点值,不带节点名-- 获取不带节点的值,例如:2221 selectextractvalue(xmltype(e.prj_xml),'/data/project/persons/person[1]/tel') asresult from project e where e.zh_title='白夜追逐繁星';Query Result:Tip: 节点不存在时,查询结果均为空3. existsnode函数,判断节点是否存在,表示存在,0表示不存在1 select existsnode(xmltype(e.prj_xml),'/data/project/persons/person[1]/tel') as result from project e where e.zh_title='白夜追逐繁星';Query Result:4. sys_dburigen,将指定列通过函数生成一个该列的URI值,例如:/PUBLIC/PROJECT/ROW[ZH_TITLE='邹成咁180117']/ZH_TITLE1 select sys_dburigen(e.zh_title) as result from project e where e.zh_title='白夜追逐繁星';Query Result:5. sys_xmlAgg,合并查询,将多个xml合并,类似于set集合-- sys_xmlGen 将xml转成xmltype实例,方便xml合并,sys_xmlAgg用于xml 合并1 select sys_xmlAgg(sys_xmlgen(e.prj_xml)) as result from project e wheree.zh_title='白夜追逐繁星'or e.zh_title='白夜追逐繁星2' ;Query Result:6. xmlforest,将指定列以xml格式查询出来,可指定生成的xml节点名称1 select xmlforest(e.zh_title as zhTitle,e.prj_no as prjNo,e.psn_code as psnCode).getStringVal() as xml from project e where e.zh_title='白夜追逐繁星';Query Result:7. xmlelement,为查询出来的xml添加挂载的父节点,并将xml字符串格式化成xml ,与xmlforest函数配套使用1 selectxmlelement(data,xmlforest(e.zh_title,e.prj_no,e.psn_code)).getStringVal() as xml from project e where e.zh_title='白夜追逐繁星';Query Result:延伸:为data节点添加属性,使用xmlattributes函数1 select xmlelement(data,xmlattributes(e.prj_code ascode),xmlforest(e.zh_title,e.prj_no,e.psn_code)).getStringVal() as xml from project e where e.zh_title='白夜追逐繁星';Query Result:延伸:XMLCOLATTVAL效果等同于xmlforest函数,但默认会为每个标签添加一个属性name,属性值为列明,若未指定列别名,默认该列列明1 select XMLCOLATTVAL(e.zh_title as zhTitle,e.prj_no as prjNo,e.psn_code as psnCode).getStringVal() as xml from project e where e.zh_title='白夜追逐繁星'Query Result:8. xmlConcat,xmlType实例之间联结1selectxmlelement(data,xmlConcat(xmltype('<a>1</a>'),xmltype('<b>1</b>'))).getStrin gVal() as result from dual;Query Result:9. xmlsequence将一个xml以标签为单位,转换成数组,也就是一行行记录1select e.getStringVal() as result2fromtable(xmlsequence(extract(xmltype('<a><b>233</b><c>666</c><d>88</d></a>'),'/ a/*'))) e;Query Result:二.添加(Insert)-- 添加xml节点,insertchildxml添加xml节点,参数3默认指定插在该节点后,若该节点不存在,则追加到子节点集合的末尾-- 添加xml节点,insertchildxmlbefore,和insertchildxmlafter添加xml节点,参数3指定插在该节点前或者后,若该节点不存在,则追加到子节点集合的末尾1update project e sete.prj_xml=insertchildxml(xmltype(e.prj_xml),'/data/project/persons/person[1] ','tel',xmltype('<tel>222</tel>')).getClobVal() where e.zh_title='白夜追逐繁星';2update project e sete.prj_xml=insertchildxmlbefore(xmltype(e.prj_xml),'/data/project/persons/per son[1]','psn_code',xmltype('<tel>111</tel>')).getClobVal() where e.zh_title='白夜追逐繁星';三.修改(Update)-- updatexml用于更新节点值1-- updatexml用于更新节点值,参数1:需要修改节点的xml字段;参数2:节点路径;参数3:值2update project e sete.prj_xml=updatexml(xmltype(e.prj_xml),'/data/project/persons/person[1]/tel','<tel>111</tel>').getClobVal() where e.zh_title='白夜追逐繁星';3update project e sete.prj_xml=updatexml(xmltype(e.prj_xml),'/data/project/persons/person[1]/tel/ text()','222').getClobVal() where e.zh_title='白夜追逐繁星';tip: getClobVal()是将xmltype类型转成clob类型方法四.删除(Delete)1-- 删除xml节点,参数1:需要删除节点的xml字段;参数2:节点路径;2update project e set e.prj_xml=deletexml(xmltype(e.prj_xml),'/data/project/persons/person[1]/tel' ).getClobVal() where e.zh_title='白夜追逐繁星';。