XML解析器

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

For personal use only in study and research; not for commercial use
XML解析器接口
缩略语
XML eXtensible Markup Language 可扩展的置标语言
DOM Document Object Model,文档对象模型
SAX Simple API for XML,简单的XML API
JAXP Java API for XML Processing XML的Java接口
DTD Document Type Definition,文档对象定义
XSL Extensible Stylesheet Language,可扩展样式语言
可扩展标记语言XML(eXtensible Markup Language)是标准通用标记语言SGML的子集,是互联网上交换数据的标准。

XML文件包含了数据对象和处理这些数据对象的程序的描述。

XML文件在被浏览器显示出来或其他应用程序使用之前,经过了以下几个步骤:解析、样式表格式化、转换、数据库访问等。

XML解析器完成第一步工作,它读取XML文件,生成语法树,审查文档结构,将结果传给应用程序。

XML解析器可以是验证解析器,用以检查文件是否有效,也可以是非验证解析器,仅为结构良好的文件进行检查。

应用程序可使用C、Java或Jscript、VBscript、ASP等脚本语言,通过解析器接口对XML 文件中的元素、属性、实体和标记进行操作。

常见的XML解析器:
1.MSXML,微软的XML解析器,与W3C规范不完全兼容。

2.Xerces,开放源码组织提供,实现Xerces本地接口XNI,支持的标准有:XML1.0、XML
命名空间、DOM2(Core, Events, and Traversal and Range)、SAX2(Core, and Extension)、JAXP1.1、XML Schema 1.0(Structures and Datatypes)。

Xerces有3个版本,分别用C++、java和perl实现。

应用程序可以链接Xerces库,调用其接口访问XML文档。

3.XML4j、XML4c,IBM对Xerces的扩展。

XML4j的API分为3类:
Public:包括DOM1、DOM2接口,SAX1、SAX2接口
Experimental:包括DOM3接口
Internal:内部接口
DOM接口:
整个XML文档被视为具有层次关系的节点树,所有的非根节点都是以一个根节点为祖先遗传下来。

例如这段文件,
<TABLE>
<TBODY>
<TR>
<TD>Shady Grove</TD>
<TD>Aeolian</TD>
</TR>
<TR>
<TD>Over the River, Charlie</TD>
<TD>Dorian</TD>
</TR>
</TBODY>
</TABLE>
DOM将其看作下面这棵树,当然“树“的观念只是逻辑上的,协议的实现可以采用各种方式。

对树进行深度优先前序遍历即可重现文档。

DOM中每一种节点类型都有一个相应的接口。

DOM的节点类型有:Document、Element、Attr、Text、CDATASection、EntityReference、Entity、ProcessingInstruction、Comment、DocumentType、DocumentFragment、Notation。

DOM1由两部分组成:Core DOM和HTML DOM。

Core DOM进一步被划分为基础接口和扩展接口,所有解析器都必须实现基础接口,如果只对HTML文档进行操作,则不需要实现扩展接口。

HTML DOM对HTML文档所特有的对象和方法进行描述,可参见相关规范,此处不详述。

基础接口:
DOMException接口:
exception DOMException {
unsigned short code;
};
// ExceptionCode
const unsigned short INDEX_SIZE_ERR = 1;
const unsigned short DOMSTRING_SIZE_ERR = 2;
const unsigned short HIERARCHY_REQUEST_ERR = 3;
const unsigned short WRONG_DOCUMENT_ERR = 4;
const unsigned short INV ALID_CHARACTER_ERR = 5;
const unsigned short NO_DA TA_ALLOWED_ERR = 6;
const unsigned short NO_MODIFICATION_ALLOWED_ERR = 7;
const unsigned short NOT_FOUND_ERR = 8;
const unsigned short NOT_SUPPORTED_ERR = 9;
const unsigned short INUSE_ATTRIBUTE_ERR = 10; DOMImplementation接口:由用户实现
interface DOMImplementation {
boolean hasFeature(in DOMString feature,
in DOMString version);
};
DocumentFragment接口:用于子树移动和插入
interface DocumentFragment : Node {
};
Document接口:
interface Document : Node {
readonly attribute DocumentType doctype;
readonly attribute DOMImplementation implementation;
readonly attribute Element documentElement;
Element createElement(in DOMString tagName)
raises(DOMException); DocumentFragment createDocumentFragment();
Text createTextNode(in DOMString data);
Comment createComment(in DOMString data);
CDA TASection createCDA TASection(in DOMString data)
raises(DOMException); ProcessingInstruction createProcessingInstruction(in DOMString target,
in DOMString data)
raises(DOMException); Attr createAttribute(in DOMString name)
raises(DOMException); EntityReference createEntityReference(in DOMString name)
raises(DOMException); NodeList getElementsByTagName(in DOMString tagname); };
Node接口
interface Node {
// NodeType
const unsigned short ELEMENT_NODE = 1;
const unsigned short ATTRIBUTE_NODE = 2;
const unsigned short TEXT_NODE = 3;
const unsigned short CDA TA_SECTION_NODE = 4;
const unsigned short ENTITY_REFERENCE_NODE = 5;
const unsigned short ENTITY_NODE = 6;
const unsigned short PROCESSING_INSTRUCTION_NODE = 7; const unsigned short COMMENT_NODE = 8;
const unsigned short DOCUMENT_NODE = 9;
const unsigned short DOCUMENT_TYPE_NODE = 10;
const unsigned short DOCUMENT_FRAGMENT_NODE = 11; const unsigned short NOTA TION_NODE = 12;
readonly attribute DOMString nodeName;
readonly attribute DOMString nodeValue;
readonly attribute unsigned short nodeType;
readonly attribute Node parentNode;
readonly attribute NodeList childNodes;
readonly attribute Node firstChild;
readonly attribute Node lastChild;
readonly attribute Node previousSibling; readonly attribute Node nextSibling;
readonly attribute NamedNodeMap attributes;
readonly attribute Document ownerDocument;
Node insertBefore(in Node newChild,
in Node refChild)
raises(DOMException);
Node replaceChild(in Node newChild,
in Node oldChild)
raises(DOMException);
Node removeChild(in Node oldChild)
raises(DOMException);
Node appendChild(in Node newChild)
raises(DOMException);
boolean hasChildNodes();
Node cloneNode(in boolean deep);
};
NodeList接口:
interface NodeList {
Node item(in unsigned long index);
readonly attribute unsigned long length;
};
NamedNodeMap接口:
interface NamedNodeMap {
Node getNamedItem(in DOMString name);
Node setNamedItem(in Node arg)
raises(DOMException);
Node removeNamedItem(in DOMString name)
raises(DOMException);
Node item(in unsigned long index);
readonly attribute unsigned long length;
};
Attr接口:
interface Attr : Node {
readonly attribute DOMString name;
readonly attribute boolean specified;
attribute DOMString value;
};
Element接口:
interface Element : Node {
readonly attribute DOMString tagName; DOMString getAttribute(in DOMString name);
void setAttribute(in DOMString name,
in DOMString value)
raises(DOMException);
void removeAttribute(in DOMString name)
raises(DOMException);
Attr getAttributeNode(in DOMString name);
Attr setAttributeNode(in Attr newAttr)
raises(DOMException);
Attr removeAttributeNode(in Attr oldAttr)
raises(DOMException); NodeList getElementsByTagName(in DOMString name); void normalize();
};
CharacterData接口:
interface CharacterData : Node {
attribute DOMString data;
readonly attribute unsigned long length;
DOMString substringData(in unsigned long offset,
in unsigned long count)
raises(DOMException);
void appendData(in DOMString arg)
raises(DOMException);
void insertData(in unsigned long offset,
in DOMString arg)
raises(DOMException);
void deleteData(in unsigned long offset,
in unsigned long count)
raises(DOMException);
void replaceData(in unsigned long offset,
in unsigned long count,
in DOMString arg)
raises(DOMException);
};
Text接口:
interface Text : CharacterData {
Text splitText(in unsigned long offset)
raises(DOMException);
};
Comment接口:
interface Comment : CharacterData {
};
扩展接口:
CDA TASection接口:标记以]]>定界的CDA TA段
interface CDATASection : Text {
};
DocumentType接口:
interface DocumentType : Node {
readonly attribute DOMString name;
readonly attribute NamedNodeMap entities;
readonly attribute NamedNodeMap notations;
};
Node接口:
interface Notation : Node {
readonly attribute DOMString publicId;
readonly attribute DOMString systemId;
};
Entity接口:
interface Entity : Node {
readonly attribute DOMString publicId;
readonly attribute DOMString systemId;
readonly attribute DOMString notationName;
};
EntityReference接口:
interface EntityReference : Node {
};
ProcessingInstruction接口:
interface ProcessingInstruction : Node {
readonly attribute DOMString target;
attribute DOMString data;
};
DOM2包括核心规范和遍历与漫游规范。

DOM3包括核心规范和抽象机制与存取规范。

DOM2和DOM3的核心规范基本保持DOM1的接口结构,增加了命名空间(namespace)概念,并对少数接口的属性和方法做了增删和修改。

module dom
{
valuetype DOMString sequence<unsigned short>;
typedef unsigned long long DOMTimeStamp;
typedef Object DOMUserData;
typedef Object DOMObject;
interface DOMImplementation;
interface DocumentType;
interface Document;
interface NodeList;
interface NamedNodeMap;
interface UserDataHandler;
interface Element;
interface DOMLocator;
exception DOMException {
unsigned short code;
};
// ExceptionCode
const unsigned short INDEX_SIZE_ERR = 1;
const unsigned short DOMSTRING_SIZE_ERR = 2;
const unsigned short HIERARCHY_REQUEST_ERR = 3;
const unsigned short WRONG_DOCUMENT_ERR = 4;
const unsigned short INV ALID_CHARACTER_ERR = 5;
const unsigned short NO_DATA_ALLOWED_ERR = 6;
const unsigned short NO_MODIFICATION_ALLOWED_ERR = 7; const unsigned short NOT_FOUND_ERR = 8;
const unsigned short NOT_SUPPORTED_ERR = 9;
const unsigned short INUSE_ATTRIBUTE_ERR = 10;
// Introduced in DOM Level 2:
const unsigned short INV ALID_STA TE_ERR = 11;
// Introduced in DOM Level 2:
const unsigned short SYNTAX_ERR = 12;
// Introduced in DOM Level 2:
const unsigned short INV ALID_MODIFICATION_ERR = 13;
// Introduced in DOM Level 2:
const unsigned short NAMESPACE_ERR = 14;
// Introduced in DOM Level 2:
const unsigned short INV ALID_ACCESS_ERR = 15;
// Introduced in DOM Level 3:
const unsigned short V ALIDATION_ERR = 16; interface DOMImplementationSource {
DOMImplementation getDOMImplementation(in DOMString features); };
interface DOMImplementation {
boolean hasFeature(in DOMString feature,
in DOMString version);
// Introduced in DOM Level 2:
DocumentType createDocumentType(in DOMString qualifiedName,
in DOMString publicId,
in DOMString systemId)
raises(DOMException);
// Introduced in DOM Level 2:
Document createDocument(in DOMString namespaceURI,
in DOMString qualifiedName,
in DocumentType doctype)
raises(DOMException);
// Introduced in DOM Level 3:
DOMImplementation getInterface(in DOMString feature);
};
interface Node {
// NodeType
const unsigned short ELEMENT_NODE = 1; const unsigned short ATTRIBUTE_NODE = 2; const unsigned short TEXT_NODE = 3; const unsigned short CDATA_SECTION_NODE = 4; const unsigned short ENTITY_REFERENCE_NODE = 5; const unsigned short ENTITY_NODE = 6; const unsigned short PROCESSING_INSTRUCTION_NODE = 7; const unsigned short COMMENT_NODE = 8; const unsigned short DOCUMENT_NODE = 9; const unsigned short DOCUMENT_TYPE_NODE = 10; const unsigned short DOCUMENT_FRAGMENT_NODE = 11; const unsigned short NOTATION_NODE = 12; readonly attribute DOMString nodeName;
attribute DOMString nodeValue;
readonly attribute unsigned short nodeType;
readonly attribute Node parentNode;
readonly attribute NodeList childNodes;
readonly attribute Node firstChild;
readonly attribute Node lastChild;
readonly attribute Node previousSibling;
readonly attribute Node nextSibling;
readonly attribute NamedNodeMap attributes;
// Modified in DOM Level 2:
readonly attribute Document ownerDocument;
// Modified in DOM Level 3:
Node insertBefore(in Node newChild,
in Node refChild)
raises(DOMException);
// Modified in DOM Level 3:
Node replaceChild(in Node newChild,
in Node oldChild)
raises(DOMException);
// Modified in DOM Level 3:
Node removeChild(in Node oldChild)
raises(DOMException);
Node appendChild(in Node newChild)
raises(DOMException);
boolean hasChildNodes();
Node cloneNode(in boolean deep);
// Modified in DOM Level 2:
void normalize();
// Introduced in DOM Level 2:
boolean isSupported(in DOMString feature,
in DOMString version);
// Introduced in DOM Level 2:
readonly attribute DOMString namespaceURI;
// Introduced in DOM Level 2:
attribute DOMString prefix;
// Introduced in DOM Level 2:
readonly attribute DOMString localName;
// Introduced in DOM Level 2:
boolean hasAttributes();
// Introduced in DOM Level 3:
readonly attribute DOMString baseURI;
// TreePosition
const unsigned short TREE_POSITION_PRECEDING = 0x01; const unsigned short TREE_POSITION_FOLLOWING = 0x02; const unsigned short TREE_POSITION_ANCESTOR = 0x04; const unsigned short TREE_POSITION_DESCENDANT = 0x08; const unsigned short TREE_POSITION_EQUIV ALENT = 0x10; const unsigned short TREE_POSITION_SAME_NODE = 0x20; const unsigned short TREE_POSITION_DISCONNECTED = 0x00; // Introduced in DOM Level 3:
unsigned short compareTreePosition(in Node other);
// Introduced in DOM Level 3:
attribute DOMString textContent;
// Introduced in DOM Level 3:
boolean isSameNode(in Node other);
// Introduced in DOM Level 3:
DOMString lookupNamespacePrefix(in DOMString namespaceURI,
in boolean useDefault);
// Introduced in DOM Level 3:
boolean isDefaultNamespace(in DOMString namespaceURI);
// Introduced in DOM Level 3:
DOMString lookupNamespaceURI(in DOMString prefix);
// Introduced in DOM Level 3:
boolean isEqualNode(in Node arg);
// Introduced in DOM Level 3:
Node getInterface(in DOMString feature);
// Introduced in DOM Level 3:
DOMUserData setUserData(in DOMString key,
in DOMUserData data,
in UserDataHandler handler);
// Introduced in DOM Level 3:
DOMUserData getUserData(in DOMString key);
};
interface NodeList {
Node item(in unsigned long index);
readonly attribute unsigned long length;
};
interface NamedNodeMap {
Node getNamedItem(in DOMString name);
Node setNamedItem(in Node arg)
raises(DOMException);
Node removeNamedItem(in DOMString name)
raises(DOMException);
Node item(in unsigned long index);
readonly attribute unsigned long length;
// Introduced in DOM Level 2:
Node getNamedItemNS(in DOMString namespaceURI,
in DOMString localName);
// Introduced in DOM Level 2:
Node setNamedItemNS(in Node arg)
raises(DOMException);
// Introduced in DOM Level 2:
Node removeNamedItemNS(in DOMString namespaceURI,
in DOMString localName)
raises(DOMException);
};
interface CharacterData : Node {
attribute DOMString data;
// raises(DOMException) on setting
// raises(DOMException) on retrieval readonly attribute unsigned long length;
DOMString substringData(in unsigned long offset,
in unsigned long count)
raises(DOMException);
void appendData(in DOMString arg)
raises(DOMException); void insertData(in unsigned long offset,
in DOMString arg)
raises(DOMException); void deleteData(in unsigned long offset,
in unsigned long count)
raises(DOMException); void replaceData(in unsigned long offset,
in unsigned long count,
in DOMString arg)
raises(DOMException); };
interface Attr : Node {
readonly attribute DOMString name;
readonly attribute boolean specified;
attribute DOMString value;
// Introduced in DOM Level 2:
readonly attribute Element ownerElement;
};
interface Element : Node {
readonly attribute DOMString tagName;
DOMString getAttribute(in DOMString name);
void setAttribute(in DOMString name,
in DOMString value)
raises(DOMException); void removeAttribute(in DOMString name)
raises(DOMException);
Attr getAttributeNode(in DOMString name);
Attr setAttributeNode(in Attr newAttr)
raises(DOMException);
Attr removeAttributeNode(in Attr oldAttr)
raises(DOMException); NodeList getElementsByTagName(in DOMString name);
// Introduced in DOM Level 2:
DOMString getAttributeNS(in DOMString namespaceURI,
in DOMString localName);
// Introduced in DOM Level 2:
void setAttributeNS(in DOMString namespaceURI,
in DOMString qualifiedName,
in DOMString value)
raises(DOMException);
// Introduced in DOM Level 2:
void removeAttributeNS(in DOMString namespaceURI,
in DOMString localName)
raises(DOMException);
// Introduced in DOM Level 2:
Attr getAttributeNodeNS(in DOMString namespaceURI,
in DOMString localName);
// Introduced in DOM Level 2:
Attr setAttributeNodeNS(in Attr newAttr)
raises(DOMException);
// Introduced in DOM Level 2:
NodeList getElementsByTagNameNS(in DOMString namespaceURI,
in DOMString localName); // Introduced in DOM Level 2:
boolean hasAttribute(in DOMString name);
// Introduced in DOM Level 2:
boolean hasAttributeNS(in DOMString namespaceURI,
in DOMString localName);
};
interface Text : CharacterData {
Text splitText(in unsigned long offset)
raises(DOMException);
// Introduced in DOM Level 3:
readonly attribute boolean isWhitespaceInElementContent;
// Introduced in DOM Level 3:
readonly attribute DOMString wholeText;
// Introduced in DOM Level 3:
Text replaceWholeText(in DOMString content)
raises(DOMException);
};
interface Comment : CharacterData {
};
interface UserDataHandler {
// OperationType
const unsigned short NODE_CLONED = 1;
const unsigned short NODE_IMPORTED = 2;
const unsigned short NODE_DELETED = 3;
const unsigned short NODE_RENAMED = 4;
void handle(in unsigned short operation,
in DOMString key,
in DOMObject data,
in Node src,
in Node dst);
};
interface DOMError {
const unsigned short SEVERITY_W ARNING = 0;
const unsigned short SEVERITY_ERROR = 1; const unsigned short SEVERITY_FATAL_ERROR = 2; readonly attribute unsigned short severity;
readonly attribute DOMString message;
readonly attribute Object relatedException;
readonly attribute DOMLocator location;
};
interface DOMErrorHandler {
boolean handleError(in DOMError error);
};
interface DOMLocator {
readonly attribute long lineNumber;
readonly attribute long columnNumber;
readonly attribute long offset;
readonly attribute Node errorNode;
readonly attribute DOMString uri;
};
interface CDATASection : Text {
};
interface DocumentType : Node {
readonly attribute DOMString name;
readonly attribute NamedNodeMap entities;
readonly attribute NamedNodeMap notations;
// Introduced in DOM Level 2:
readonly attribute DOMString publicId;
// Introduced in DOM Level 2:
readonly attribute DOMString systemId;
// Introduced in DOM Level 2:
readonly attribute DOMString internalSubset;
};
interface Notation : Node {
readonly attribute DOMString publicId;
readonly attribute DOMString systemId;
};
interface Entity : Node {
readonly attribute DOMString publicId;
readonly attribute DOMString systemId;
readonly attribute DOMString notationName;
// Introduced in DOM Level 3:
attribute DOMString actualEncoding;
// Introduced in DOM Level 3:
attribute DOMString encoding;
// Introduced in DOM Level 3:
attribute DOMString version;
};
interface EntityReference : Node {
};
interface ProcessingInstruction : Node {
readonly attribute DOMString target;
attribute DOMString data;
};
interface DocumentFragment : Node {
};
interface Document : Node {
// Modified in DOM Level 3:
readonly attribute DocumentType doctype;
readonly attribute DOMImplementation implementation;
readonly attribute Element documentElement;
Element createElement(in DOMString tagName)
raises(DOMException); DocumentFragment createDocumentFragment();
Text createTextNode(in DOMString data);
Comment createComment(in DOMString data);
CDATASection createCDATASection(in DOMString data)
raises(DOMException); ProcessingInstruction createProcessingInstruction(in DOMString target,
in DOMString data)
raises(DOMException);
Attr createAttribute(in DOMString name)
raises(DOMException); EntityReference createEntityReference(in DOMString name)
raises(DOMException);
NodeList getElementsByTagName(in DOMString tagname);
// Introduced in DOM Level 2:
Node importNode(in Node importedNode,
in boolean deep)
raises(DOMException);
// Introduced in DOM Level 2:
Element createElementNS(in DOMString namespaceURI,
in DOMString qualifiedName)
raises(DOMException);
// Introduced in DOM Level 2:
Attr createAttributeNS(in DOMString namespaceURI,
in DOMString qualifiedName)
raises(DOMException);
// Introduced in DOM Level 2:
NodeList getElementsByTagNameNS(in DOMString namespaceURI,
in DOMString localName);
// Introduced in DOM Level 2:
Element getElementById(in DOMString elementId);
// Introduced in DOM Level 3:
attribute DOMString actualEncoding;
// Introduced in DOM Level 3:
attribute DOMString encoding;
// Introduced in DOM Level 3:
attribute boolean standalone;
// Introduced in DOM Level 3:
attribute DOMString version;
// Introduced in DOM Level 3:
attribute boolean strictErrorChecking;
// Introduced in DOM Level 3:
attribute DOMErrorHandler errorHandler;
// Introduced in DOM Level 3:
attribute DOMString documentURI;
// Introduced in DOM Level 3:
Node adoptNode(in Node source)
raises(DOMException);
// Introduced in DOM Level 3:
void normalizeDocument();
// Introduced in DOM Level 3:
boolean canSetNormalizationFeature(in DOMString name,
in boolean state);
// Introduced in DOM Level 3:
void setNormalizationFeature(in DOMString name,
in boolean state)
raises(DOMException);
// Introduced in DOM Level 3:
boolean getNormalizationFeature(in DOMString name)
raises(DOMException);
// Introduced in DOM Level 3:
Node renameNode(in Node n,
in DOMString namespaceURI,
in DOMString name)
raises(DOMException);
};
};
DOM2遍历与漫游规范包括两个模块,分别提供文档结构遍历和文档定位操作功能。

DOM2遍历模块
应用程序可以用DOMImplementation接口的hasFeature(Traversal,2.0)方法来判断解析器是否具有遍历功能。

节点叠代子NodeIterator把文档子树上的节点看作是一有序序列。

叠代子的节点指针可以在该序列前后移动,但因没有层次观念而不能上下运动。

相反树遍历子TreeWalker维护
了子树的层次关系,焦点可以在上下层移动。

节点过滤子NodeFilter经常伴随NodeIterator 和TreeWalker而出现,用于选择逻辑可见的节点。

使用方法示例:调用DocumentTraversal 接口的方法createNodeIterator生成叠代子NodeIterators,并用属性whatToShow标识出哪些类型的节点是可见的,接着可以调用nextNode()方法遍历文档树。

NodeIterator iter=
((DocumentTraversal)document).createNodeIterator(
root, NodeFilter.SHOW_ELEMENT, null);
while (Node n = iter.nextNode())
printMe(n);
DOM2遍历规范形式化描述:
module traversal
{
typedef dom::Node Node;
interface NodeFilter;
// Introduced in DOM Level 2:
interface NodeIterator {
readonly attribute Node root;
readonly attribute unsigned long whatToShow;
readonly attribute NodeFilter filter;
readonly attribute boolean expandEntityReferences;
Node nextNode()
raises(dom::DOMException);
Node previousNode()
raises(dom::DOMException);
void detach();
};
// Introduced in DOM Level 2:
interface NodeFilter {
// Constants returned by acceptNode
const short FILTER_ACCEPT = 1;
const short FILTER_REJECT = 2;
const short FILTER_SKIP = 3;
// Constants for whatToShow
const unsigned long SHOW_ALL = 0xFFFFFFFF;
const unsigned long SHOW_ELEMENT = 0x00000001;
const unsigned long SHOW_A TTRIBUTE = 0x00000002;
const unsigned long SHOW_TEXT = 0x00000004;
const unsigned long SHOW_CDATA_SECTION = 0x00000008;
const unsigned long SHOW_ENTITY_REFERENCE = 0x00000010;
const unsigned long SHOW_ENTITY = 0x00000020;
const unsigned long SHOW_PROCESSING_INSTRUCTION= 0x00000040;
const unsigned long SHOW_COMMENT = 0x00000080;
const unsigned long SHOW_DOCUMENT = 0x00000100;
const unsigned long SHOW_DOCUMENT_TYPE = 0x00000200;
const unsigned long SHOW_DOCUMENT_FRAGMENT = 0x00000400;
const unsigned long SHOW_NOTA TION = 0x00000800;
short acceptNode(in Node n);
};
// Introduced in DOM Level 2:
interface TreeWalker {
readonly attribute Node root;
readonly attribute unsigned long whatToShow;
readonly attribute NodeFilter filter;
readonly attribute boolean expandEntityReferences;
attribute Node currentNode;
Node parentNode();
Node firstChild();
Node lastChild();
Node previousSibling();
Node nextSibling();
Node previousNode();
Node nextNode();
};
// Introduced in DOM Level 2:
interface DocumentTraversal {
NodeIterator createNodeIterator(in Node root,
in unsigned long whatToShow,
in NodeFilter filter,
in boolean entityReferenceExpansion)
raises(dom::DOMException);
TreeWalker createTreeWalker(in Node root,
in unsigned long whatToShow,
in NodeFilter filter,
in boolean entityReferenceExpansion)
raises(dom::DOMException);
};
};
DOM2定位操作模块:此功能用于定位和操作文档片段。

文档片段边界点的公共祖先节点(祖先容器)必须是Document, DocumentFragment 或Attr 节点,也称根容器,以根容器为根节点的树叫做文档片段的上下文树。

我们可以对边界点进行修改、比较,对文档片段进行删除、剪切、拷贝、粘贴操作,或用一个节点包含该片段,以及操作完成后对文档进行调整。

DOM2定位操作形式化描述:
module ranges
{
typedef dom::Node Node;
typedef dom::DocumentFragment DocumentFragment;
typedef dom::DOMString DOMString;
// Introduced in DOM Level 2:
exception RangeException {
unsigned short code;
};
// RangeExceptionCode
const unsigned short BAD_BOUNDARYPOINTS_ERR = 1; const unsigned short INV ALID_NODE_TYPE_ERR = 2;
// Introduced in DOM Level 2:
interface Range {
readonly attribute Node startContainer;
readonly attribute long startOffset;
readonly attribute Node endContainer;
readonly attribute long endOffset;
readonly attribute boolean collapsed;
readonly attribute Node commonAncestorContainer;
void setStart(in Node refNode,
in long offset)
raises(RangeException, dom::DOMException);
void setEnd(in Node refNode,
in long offset)
raises(RangeException, dom::DOMException);
void setStartBefore(in Node refNode)
raises(RangeException, dom::DOMException); void setStartAfter(in Node refNode)
raises(RangeException, dom::DOMException);
void setEndBefore(in Node refNode)
raises(RangeException, dom::DOMException);
void setEndAfter(in Node refNode)
raises(RangeException, dom::DOMException);
void collapse(in boolean toStart)
raises(dom::DOMException);
void selectNode(in Node refNode)
raises(RangeException, dom::DOMException);
void selectNodeContents(in Node refNode)
raises(RangeException, dom::DOMException); // CompareHow
const unsigned short START_TO_START = 0; const unsigned short START_TO_END = 1; const unsigned short END_TO_END = 2; const unsigned short END_TO_START = 3; short compareBoundaryPoints(in unsigned short how,
in Range sourceRange)
raises(dom::DOMException);
void deleteContents()
raises(dom::DOMException);
DocumentFragment extractContents()
raises(dom::DOMException);
DocumentFragment cloneContents()
raises(dom::DOMException);
void insertNode(in Node newNode)
raises(dom::DOMException, RangeException);
void surroundContents(in Node newParent)
raises(dom::DOMException, RangeException);
Range cloneRange()
raises(dom::DOMException);
DOMString toString()
raises(dom::DOMException);
void detach()
raises(dom::DOMException);
};
// Introduced in DOM Level 2:
interface DocumentRange {
Range createRange();
};
};
DOM3抽象机制和存取规范:
对DTDs和XML Schemas的表示和操作进行描述。

与我们的需求不密切,略。

SAX接口:
如前面所述,在处理DOM的时候,需要读入整个的XML文档,然后在内存中创建DOM 树,生成DOM树上的每个Node对象。

当文档比较小的时候,这不会造成什么问题,但是一旦文档大起来,处理DOM就会变得相当费时费力,对于内存的需求将成倍的增长。

SAX在概念上与DOM完全不同,它是事件驱动的,也就是说读入文档的过程和解析的过程同时进行。

解析开始之前,需要向XMLReader注册一个ContentHandler,也就是相当于一个事件监听器,在ContentHandler中定义了很多方法,比如startDocument(),它定制了当在解析过程中,遇到文档开始时应该处理的事情。

当XMLReader读到合适的内容,就会抛出相应的事件,并把这个事件的处理权代理给ContentHandler,调用其相应的方法进行响应。

每当处理一个特定的XML文件,就需要为其创建一个实现了ContentHandler的类来处理特定的事件。

SAX接口描述:
ContentHandler
Primary SAX interface that models the Infoset's core information items
ErrorHandler
Models fatal errors, errors, and warnings (as per XML 1.0)
DTDHandler
Models unparsed entities and notations
EntityResolver
Allows an application to perform custom resolution of external entity identifiers LexicalHandler
Models noncore lexical information (comments, CDATA sections, entity references, and so on) DeclHandler
Models element and attribute declarations
XMLReader
Makes it possible to tie the previously listed interfaces together in order to process a complete document information item
Attributes
Models a collection of attributes
Locator
Provides contextual information about the caller
ContentHandler接口的成员方法:
startDocument
Signals the beginning of a document.
endDocument
Signals the end of a document.
startElement
Signals the beginning of an element.
endElement
Signals the end of an element.
startPrefixMapping
Signals the beginning of a prefix-URI Namespace mapping scope.
endPrefixMapping
Signals the end of a prefix URI mapping scope.
Characters
Signals character data.
ignorableWhitespace
Signals ignorable whitespace in element content. This method is not called in the current implementation because the parser is nonvalidating.
processingInstruction
Signals a processing instruction.
skippedEntity
Signals a skipped entity.
DocumentLocator
Receives a Locator interface reference, which provides methods for returning the column number, line number, PublicID, or SystemID from the caller for a current item.
SAX接口的实现效率高,适用于对大文件进行过滤和查找。

但没有上下文信息,不能反向遍历,也不能修改XML文件的内部结构和内容。

比如说,SAX并不纪录以前所碰到的标签,在startElement()方法中,可以知道标签的名字和属性,至于标签的嵌套结构,上层标签的名字,是否有子元属等等其它与结构相关的信息,都是不得而知的,需要应用程序来完成。

应用程序一般是利用XML标签严格嵌套的特性,采用堆栈记录上下文信息。

总结:1. DOM已经有成熟标准,基于传统的编译器前端构造方法,具有面向对象特征,提供的信息量大,能满足应用程序的所有需求,但开销也很大。

相关文档
最新文档