This section defines a set of interfaces for loading and saving
document objects as defined in [DOM Level 2 Core] or
newer. The functionality specified in this section (the Load
and Save functionality) is sufficient to allow software
developers and Web script authors to load and save XML content
inside conforming products. The DOM Load and Save API also allows filtering of XML content
using only DOM API calls; access and manipulation of the
Document
is defined in [DOM Level 2 Core] or
newer.
The proposal for loading is influenced by the Java APIs for XML Processing [JAXP] and by SAX2 [SAX].
The interfaces involved with the loading and saving of XML documents are:
DOMImplementationLS
-- An extended
DOMImplementation
interface that provides the
factory methods for creating the objects required for
loading and saving.
LSParser
-- An interface for parsing data into
DOM documents.
LSInput
-- Encapsulates information about the
data to be loaded.
LSResourceResolver
-- Provides a way for
applications to redirect references to external resources
when parsing.
LSParserFilter
-- Provides the ability to
examine and optionally remove nodes as they are being
processed while parsing.
LSSerializer
-- An interface for serializing
DOM documents or nodes.
LSOutput
-- Encapsulates information about the
destination for the data to be output.
LSSerializerFilter
-- Provides the ability to
examine and filter DOM nodes as they are being processed for
the serialization.
To ensure interoperability, this specification specifies the following basic types used in various DOM modules. Even though the DOM uses the basic types in the interfaces, bindings may use different types and normative bindings are only given for Java and ECMAScript in this specification.
LSInputStream
TypeThis type is used to represent a sequence of input bytes.
A LSInputStream
represents a reference to a
byte stream source of an XML input.
typedef Object LSInputStream;
Note:
For Java, LSInputStream
is bound to the
java.io.InputStream
type. For ECMAScript,
LSInputStream
is bound to Object
.
LSOutputStream
TypeThis type is used to represent a sequence of output bytes.
A LSOutputStream
represents a byte
stream destination for the XML output.
typedef Object LSOutputStream;
Note:
For Java, LSOutputStream
is bound to the
java.io.OutputStream
type. For ECMAScript,
LSOutputStream
is bound to Object
.
LSReader
TypeThis type is used to represent a sequence of input characters in 16-bit units. The encoding used for the characters is UTF-16, as defined in [Unicode] and in [ISO/IEC 10646]).
A LSReader
represents a character
stream for the XML input.
typedef Object LSReader;
Note:
For Java, LSReader
is bound to the
java.io.Reader
type. For ECMAScript,
LSReader
is not bound, and
therefore has no recommended meaning in ECMAScript.
LSWriter
TypeThis type is used to represent a sequence of output characters in 16-bit units. The encoding used for the characters is UTF-16, as defined in [Unicode] and in [ISO/IEC 10646]).
A LSWriter
represents a character
stream for the XML output.
typedef Object LSWriter;
Note:
For Java, LSWriter
is bound to the
java.io.Writer
type. For ECMAScript,
LSWriter
is not bound, and
therefore has no recommended meaning in ECMAScript.
The interfaces within this section are considered fundamental, and must be fully implemented by all conforming implementations of the DOM Load and Save module.
A DOM application may use the hasFeature(feature,
version)
method of the DOMImplementation
interface with parameter values "LS"
(or
"LS-Async"
) and "3.0"
(respectively)
to determine whether or not these interfaces are supported by
the implementation. In order to fully support them, an
implementation must also support the "Core" feature defined in
[DOM Level 2 Core].
A DOM application may use the hasFeature(feature,
version)
method of the DOMImplementation
interface with parameter values "LS-Async"
and
"3.0"
(respectively) to determine whether or not
the asynchronous mode is supported by the implementation. In
order to fully support the asynchronous mode, an
implementation must also support the "LS"
feature
defined in this section.
For additional information about conformance, please see the DOM Level 3 Core specification [DOM Level 3 Core].
Parser or write operations may throw an LSException
if the processing is stopped. The processing can be stopped due to
a DOMError
with a severity of
DOMError.SEVERITY_FATAL_ERROR
or a non recovered
DOMError.SEVERITY_ERROR
, or if
DOMErrorHandler.handleError()
returned
false
.
Note:
As suggested in the definition of the constants in the
DOMError
interface, a DOM implementation may choose
to continue after a fatal error, but the resulting DOM tree is
then implementation dependent.
exception LSException { unsigned short code; }; // LSExceptionCode const unsigned short PARSE_ERR = 81; const unsigned short SERIALIZE_ERR = 82;
An integer indicating the type of error generated.
PARSE_ERR
LSParser
and the processing has been stopped.
SERIALIZE_ERR
Node
using
LSSerializer
and the processing has been stopped.
DOMImplementationLS
contains the factory methods for
creating Load and Save objects.
The expectation is that an instance of the
DOMImplementationLS
interface can be obtained by
using binding-specific casting methods on an instance of the
DOMImplementation
interface or, if the
Document
supports the feature "Core"
version "3.0"
defined in [DOM Level 3 Core], by using the method
DOMImplementation.getFeature
with parameter values
"LS"
(or "LS-Async"
) and
"3.0"
(respectively).
interface DOMImplementationLS { // DOMImplementationLSMode const unsigned short MODE_SYNCHRONOUS = 1; const unsigned short MODE_ASYNCHRONOUS = 2; LSParser createLSParser(in unsigned short mode, in DOMString schemaType) raises(DOMException); LSSerializer createLSSerializer(); LSInput createLSInput(); LSOutput createLSOutput(); };
Integer parser mode constants.
MODE_ASYNCHRONOUS
LSParser
.MODE_SYNCHRONOUS
LSParser
.createLSInput
LSInput.characterStream
,
LSInput.byteStream
,
LSInput.stringData
LSInput.systemId
,
LSInput.publicId
, LSInput.baseURI
,
and LSInput.encoding
are null, and
LSInput.certifiedText
is false.
The newly created input object. |
createLSOutput
LSOutput.characterStream
,
LSOutput.byteStream
,
LSOutput.systemId
,
LSOutput.encoding
are null.
The newly created output object. |
createLSParser
LSParser
. The newly constructed
parser may then be configured by means of its
DOMConfiguration
object, and used to parse documents by
means of its parse
method. mode
of type
unsigned short
mode
argument is either
MODE_SYNCHRONOUS
or MODE_ASYNCHRONOUS
,
if mode
is MODE_SYNCHRONOUS
then the
LSParser
that is created will operate in
synchronous mode, if it's MODE_ASYNCHRONOUS
then
the LSParser
that is created will operate in
asynchronous mode. schemaType
of type
DOMString
Document
using the newly created
LSParser
. Note that no lexical checking is
done on the absolute URI. In order to create a
LSParser
for any kind of schema types
(i.e. the LSParser will be free to use any schema found),
use the value null
.
Note:
For W3C XML Schema [XML Schema Part 1], applications must use the value
"http://www.w3.org/2001/XMLSchema"
. For XML
DTD [XML 1.0], applications
must use the value
"http://www.w3.org/TR/REC-xml"
. Other Schema
languages are outside the scope of the W3C and therefore
should recommend an absolute URI in order to use this
method.
The newly created Note:
By default, the newly created |
|
NOT_SUPPORTED_ERR: Raised if the requested mode or schema type is not supported. |
createLSSerializer
LSSerializer
object.
The newly created Note:
By default, the newly created |
An interface to an object that is able to build, or augment, a DOM tree from various input sources.
LSParser
provides an API for parsing XML and
building the corresponding DOM document structure. A
LSParser
instance can be obtained by invoking the
DOMImplementationLS.createLSParser()
method.
As specified in [DOM Level 3 Core], when a document is first made available via the LSParser:
value
and
nodeValue
attributes of an Attr
node initially return the XML 1.0 normalized
value. However, if the parameters "validate-if-schema"
and "datatype-normalization"
are set to true
, depending on the attribute
normalization used, the attribute values may differ from the
ones obtained by the XML 1.0 attribute
normalization. If the parameters "datatype-normalization"
is set to false
, the XML 1.0 attribute
normalization is guaranteed to occur, and if the attributes
list does not contain namespace declarations, the
attributes
attribute on Element
node represents the property
[attributes] defined in [XML Information Set].
Asynchronous LSParser
objects are expected to also
implement the events::EventTarget
interface so that
event listeners can be registered on asynchronous
LSParser
objects.
Events supported by asynchronous LSParser
objects are:
LSParser
finishes to load the
document. See also the definition of the
LSLoadEvent
interface.
LSParser
signals progress as data is
parsed.
LSProgressEvent
interface.
Note:
All events defined in this specification use the namespace URI
"http://www.w3.org/2002/DOMLS"
.
While parsing an input source, errors are reported to the
application through the error handler
(LSParser.domConfig
's "error-handler"
parameter). This specification does in no way try to define all
possible errors that can occur while parsing XML, or any other
markup, but some common error cases are defined. The types
(DOMError.type
) of errors and warnings defined by
this specification are:
"check-character-normalization-failure" [error]
"doctype-not-allowed" [fatal]
true
and a doctype is encountered.
"no-input-specified" [fatal]
LSInput
object.
"pi-base-uri-not-preserved" [warning]
false
and the following XML file is
parsed:
<!DOCTYPE root [ <!ENTITY e SYSTEM 'subdir/myentity.ent' ]> <root> &e; </root>
subdir/myentity.ent
contains:
<one> <two/> </one> <?pi 3.14159?> <more/>
"unbound-prefix-in-entity" [warning]
true
and an unbound namespace
prefix is encountered in an entity's replacement
text. Raising this warning is not enforced since some
existing parsers may not recognize unbound namespace
prefixes in the replacement text of entities.
"unknown-character-denormalization" [fatal]
false
and a character is
encountered for which the processor cannot determine the
normalization properties.
"unsupported-encoding" [fatal]
"unsupported-media-type" [fatal]
true
and an unsupported media type
is encountered.
In addition to raising the defined errors and warnings, implementations are expected to raise implementation specific errors and warnings for any other error and warning cases such as IO errors (file not found, permission denied,...), XML well-formedness errors, and so on.
interface LSParser { readonly attribute DOMConfiguration domConfig; attribute LSParserFilter filter; readonly attribute boolean async; readonly attribute boolean busy; Document parse(in LSInput input) raises(DOMException, LSException); Document parseURI(in DOMString uri) raises(DOMException, LSException); // ACTION_TYPES const unsigned short ACTION_APPEND_AS_CHILDREN = 1; const unsigned short ACTION_REPLACE_CHILDREN = 2; const unsigned short ACTION_INSERT_BEFORE = 3; const unsigned short ACTION_INSERT_AFTER = 4; const unsigned short ACTION_REPLACE = 5; Node parseWithContext(in LSInput input, in Node contextArg, in unsigned short action) raises(DOMException, LSException); void abort(); };
A set of possible actions for the parseWithContext
method.
ACTION_APPEND_AS_CHILDREN
Element
or a
DocumentFragment
.
ACTION_INSERT_AFTER
Element
or a DocumentFragment
.
ACTION_INSERT_BEFORE
Element
or a DocumentFragment
.
ACTION_REPLACE
Element
or a DocumentFragment
.
ACTION_REPLACE_CHILDREN
Element
, a
Document
, or a DocumentFragment
.
async
of type boolean
, readonlytrue
if the LSParser
is asynchronous,
false
if it is synchronous.
busy
of type boolean
, readonlytrue
if the LSParser
is currently
busy loading a document, otherwise false
.
domConfig
of type DOMConfiguration
, readonlyDOMConfiguration
object used when parsing an
input source. This DOMConfiguration
is specific to
the parse operation. No parameter values from this
DOMConfiguration
object are passed automatically to
the DOMConfiguration
object on the
Document
that is created, or used, by the parse
operation. The DOM application is responsible for passing any
needed parameter values from this DOMConfiguration
object to the DOMConfiguration
object referenced by
the Document
object.
DOMConfiguration
objects for LSParser
add or modify the following parameters:
"charset-overrides-xml-encoding"
true
LSInput
overrides any encoding from
the protocol.
false
"disallow-doctype"
true
false
"ignore-unknown-character-denormalizations"
true
false
"infoset"
DOMConfiguration
for
a description of this parameter. Unlike in [DOM Level 3 Core], this parameter will default to
true
for LSParser
.
"namespaces"
true
false
"resource-resolver"
LSResourceResolver
object, or null. If the value of this parameter is not
null when an external resource (such as an external XML
entity or an XML schema location) is encountered, the
implementation will request that the
LSResourceResolver
referenced in this
parameter resolves the resource.
"supported-media-types-only"
true
false
"validate"
DOMConfiguration
for a
description of this parameter. Unlike in [DOM Level 3 Core], the processing of the internal subset is
always accomplished, even if this parameter is set to
false
.
"validate-if-schema"
DOMConfiguration
for a
description of this parameter. Unlike in [DOM Level 3 Core], the processing of the internal subset is
always accomplished, even if this parameter is set to
false
.
"well-formed"
DOMConfiguration
for a
description of this parameter. Unlike in [DOM Level 3 Core], this parameter cannot be set to
false
.
filter
of type LSParserFilter
DOMConfiguration
parameters have been applied. For
example, if "validate"
is set to true
, the validation is done before
invoking the filter.
abort
LSParser
. If the
LSParser
is currently not busy, a call to this
method does nothing.
parse
LSInput
.
|
If the |
|
INVALID_STATE_ERR: Raised if the |
PARSE_ERR: Raised if the |
parseURI
uri
of type
DOMString
|
If the |
|
INVALID_STATE_ERR: Raised if the |
PARSE_ERR: Raised if the |
parseWithContext
LSInput
and insert the content into an existing
document at the position specified with the
context
and action
arguments. When
parsing the input stream, the context node (or its parent,
depending on where the result will be inserted) is used for
resolving unbound namespace prefixes. The context node's
ownerDocument
node (or the node itself if the
node of type DOCUMENT_NODE
) is used to resolve
default attributes and entity references.
Document
node and the
action is ACTION_REPLACE_CHILDREN
, then the
document that is passed as the context node will be changed
such that its xmlEncoding
,
documentURI
, xmlVersion
,
inputEncoding
, xmlStandalone
, and all
other such attributes are set to what they would be set to if
the input source was parsed using
LSParser.parse()
.
LSParser
is asynchronous
(LSParser.async
is true
).
ErrorHandler
instance associated with the
"error-handler"
parameter of the DOMConfiguration
.
parseWithContext
, the values of the
following configuration parameters will be ignored and their
default values will always be used instead: "validate",
"validate-if-schema",
and "element-content-whitespace". Other
parameters will be treated normally, and the parser is
expected to call the LSParserFilter
just as if a
whole document was parsed.
input
of type
LSInput
LSInput
from which the source document is
to be read. The source document must be an XML fragment,
i.e. anything except a complete XML document (except in the
case where the context node of type
DOCUMENT_NODE
, and the action is
ACTION_REPLACE_CHILDREN
), a DOCTYPE (internal
subset), entity declaration(s), notation declaration(s), or
XML or text declaration(s).
contextArg
of type
Node
Document
node, a DocumentFragment
node, or a node of a
type that is allowed as a child of an Element
node, e.g. it cannot be an Attribute
node.
action
of type
unsigned short
ACTION_TYPES
above.
|
Return the node that is the result of the parse operation. If the result is more than one top-level node, the first one is returned. |
|
HIERARCHY_REQUEST_ERR: Raised if the content cannot
replace, be inserted before, after, or as a
child of the context node (see also
NOT_SUPPORTED_ERR: Raised if the NO_MODIFICATION_ALLOWED_ERR: Raised if the context node is a read only node and the content is being appended to its child list, or if the parent node of the context node is read only node and the content is being inserted in its child list.
INVALID_STATE_ERR: Raised if the |
PARSE_ERR: Raised if the |
This interface represents an input source for data.
This interface allows an application to encapsulate information about an input source in a single object, which may include a public identifier, a system identifier, a byte stream (possibly with a specified encoding), a base URI, and/or a character stream.
The exact definitions of a byte stream and a character stream are binding dependent.
The application is expected to provide objects that implement
this interface whenever such objects are needed. The application
can either provide its own objects that implement this
interface, or it can use the generic factory method
DOMImplementationLS.createLSInput()
to create
objects that implement this interface.
The LSParser
will use the LSInput
object to determine how to read data. The LSParser
will look at the different inputs specified in the
LSInput
in the following order to know which one
to read from, the first one that is not null and not an empty
string will be used:
If all inputs are null, the LSParser
will report a
DOMError
with its DOMError.type
set to
"no-input-specified"
and its
DOMError.severity
set to
DOMError.SEVERITY_FATAL_ERROR
.
LSInput
objects belong to the application. The DOM
implementation will never modify them (though it may make copies
and modify the copies, if necessary).
interface LSInput { // Depending on the language binding in use, // this attribute may not be available. attribute LSReader characterStream; attribute LSInputStream byteStream; attribute DOMString stringData; attribute DOMString systemId; attribute DOMString publicId; attribute DOMString baseURI; attribute DOMString encoding; attribute boolean certifiedText; };
baseURI
of type DOMString
systemId
to an absolute URI.
byteStream
of type LSInputStream
certifiedText
of type boolean
characterStream
of type LSReader
encoding
of type DOMString
publicId
of type DOMString
stringData
of type DOMString
stringData
. If an XML declaration is present, the
value of the encoding attribute will be ignored.
systemId
of type DOMString
encoding
attribute.
baseURI
as the base, if that fails, the behavior is
implementation dependent.
LSResourceResolver
provides a way for applications
to redirect references to external resources.
Applications needing to implement custom handling for external
resources can implement this interface and register their
implementation by setting the "resource-resolver" parameter of
DOMConfiguration
objects attached to
LSParser
and LSSerializer
. It can also
be register on DOMConfiguration
objects attached to
Document
if the "LS" feature is supported.
The LSParser
will then allow the application to
intercept any external entities, including the external DTD subset
and external parameter entities, before including them. The
top-level document entity is never passed to the
resolveResource
method.
Many DOM applications will not need to implement this interface, but it will be especially useful for applications that build XML documents from databases or other specialized input sources, or for applications that use URNs.
Note:
LSResourceResolver
is based on the SAX2 [SAX] EntityResolver
interface.
interface LSResourceResolver { LSInput resolveResource(in DOMString type, in DOMString namespaceURI, in DOMString publicId, in DOMString systemId, in DOMString baseURI); };
resolveResource
LSParser
will call this method before opening
any external resource, including the external DTD subset,
external entities referenced within the DTD, and external
entities referenced within the document element (however, the
top-level document entity is not passed to this method). The
application may then request that the LSParser
resolve the external resource itself, that it use an alternative
URI, or that it use an entirely different input source.
type
of type
DOMString
"http://www.w3.org/TR/REC-xml"
. For XML
Schema [XML Schema Part 1],
applications must use the value
"http://www.w3.org/2001/XMLSchema"
. Other
types of resources are outside the scope of this
specification and therefore should recommend an absolute
URI in order to use this method.
namespaceURI
of type
DOMString
publicId
of type
DOMString
null
if no public identifier
was supplied or if the resource is not an entity.
systemId
of type
DOMString
null
if no system identifier was supplied.
baseURI
of type
DOMString
null
if there is no base URI.
LSParserFilter
s provide applications the ability to
examine nodes as they are being constructed while parsing.
As each node is examined, it may be modified or removed,
or the entire parse may be terminated early.
At the time any of the filter methods are called by the parser,
the owner Document and DOMImplementation objects exist and are
accessible. The document element is never passed to the
LSParserFilter
methods, i.e. it is not possible to
filter out the document element. Document
,
DocumentType
, Notation
,
Entity
, and Attr
nodes are never passed
to the acceptNode
method on the filter. The child
nodes of an EntityReference
node are passed to the
filter if the parameter "entities"
is set to false
. Note that, as described by the
parameter "entities",
unexpanded entity reference nodes are never discarded and are
always passed to the filter.
All validity checking while parsing a document occurs on the source document as it appears on the input stream, not on the DOM document as it is built in memory. With filters, the document in memory may be a subset of the document on the stream, and its validity may have been affected by the filtering.
All default attributes must be present on elements when the elements are passed to the filter methods. All other default content must be passed to the filter methods.
DOM applications must not raise exceptions in a filter. The effect of throwing exceptions from a filter is DOM implementation dependent.
interface LSParserFilter { // Constants returned by startElement and acceptNode const short FILTER_ACCEPT = 1; const short FILTER_REJECT = 2; const short FILTER_SKIP = 3; const short FILTER_INTERRUPT = 4; unsigned short startElement(in Element elementArg); unsigned short acceptNode(in Node nodeArg); readonly attribute unsigned long whatToShow; };
Constants returned by startElement
and
acceptNode
.
FILTER_ACCEPT
FILTER_INTERRUPT
FILTER_REJECT
FILTER_SKIP
whatToShow
of type unsigned long
, readonlyLSParser
what types of nodes to show to
the method LSParserFilter.acceptNode
. If a node is
not shown to the filter using this attribute, it is
automatically included in the DOM document being built. See
NodeFilter
for definition of the constants. The
constants SHOW_ATTRIBUTE
,
SHOW_DOCUMENT
, SHOW_DOCUMENT_TYPE
,
SHOW_NOTATION
, SHOW_ENTITY
, and
SHOW_DOCUMENT_FRAGMENT
are meaningless here. Those
nodes will never be passed to
LSParserFilter.acceptNode
.
acceptNode
nodeArg
of type
Node
|
|
startElement
Element
start tag has been scanned, but before
the remainder of the Element
is processed. The
intent is to allow the element, including any children, to be
efficiently skipped. Note that only element nodes are passed
to the startElement
function.
startElement
for
filtering will include all of the Element's attributes,
but none of the children nodes. The Element may not yet be
in place in the document being constructed (it may not have
a parent node.) startElement
filter function may access or change the
attributes for the Element. Changing Namespace declarations will
have no effect on namespace resolution by the parser.elementArg
of type
Element
|
Returning any other values will result in unspecified behavior. |
This interface represents a progress event object that notifies
the application about progress as a document is parsed. It extends
the Event
interface defined in [DOM Level 3 Events].
The units used for the attributes position
and
totalSize
are not specified and can be implementation
and input dependent.
interface LSProgressEvent : events::Event { readonly attribute LSInput input; readonly attribute unsigned long position; readonly attribute unsigned long totalSize; };
input
of type LSInput
, readonlyposition
of type unsigned long
, readonlytotalSize
of type unsigned long
, readonly0
is returned if the total size cannot be
determined or estimated.This interface represents a load event object that signals the completion of a document load.
interface LSLoadEvent : events::Event { readonly attribute Document newDocument; readonly attribute LSInput input; };
input
of type LSInput
, readonlynewDocument
of type Document
, readonly
A LSSerializer
provides an API for serializing
(writing) a DOM document out into XML. The XML data is written to
a string or an output stream. Any changes or fixups made during
the serialization affect only the serialized data. The
Document
object and its children are never altered by
the serialization operation.
During serialization of XML data, namespace fixup is done as
defined in [DOM Level 3 Core], Appendix B. [DOM Level 2 Core] allows empty strings as a real namespace
URI. If the namespaceURI
of a Node
is
empty string, the serialization will treat them as
null
, ignoring the prefix if any.
LSSerializer
accepts any node type for
serialization. For nodes of type Document
or
Entity
, well-formed XML will be created when
possible (well-formedness is guaranteed if the document or
entity comes from a parse operation and is unchanged since it
was created). The serialized output for these node types is
either as a XML document or an External XML Entity,
respectively, and is acceptable input for an XML parser. For all
other types of nodes the serialized form is implementation
dependent.
Within a Document
, DocumentFragment
, or
Entity
being serialized, Nodes
are
processed as follows
Document
nodes are written, including the XML
declaration (unless the parameter "xml-declaration"
is set to false
) and a DTD subset, if one exists
in the DOM. Writing a Document
node serializes
the entire document.
Entity
nodes, when written directly by
LSSerializer.write
, outputs the entity expansion
but no namespace fixup is done. The resulting output will be
valid as an external entity.
true
, EntityReference
nodes are serialized as an entity reference of the form
"&entityName;
" in the output. Child nodes
(the expansion) of the entity reference are ignored. If the
parameter "entities"
is set to false
, only the children of the entity
reference are serialized. EntityReference
nodes
with no children (no corresponding Entity
node or
the corresponding Entity
nodes have no children)
are always serialized.
CDATAsections
containing content characters that
cannot be represented in the specified output encoding are
handled according to the "split-cdata-sections"
parameter.
true
,
CDATAsections
are split, and the unrepresentable
characters are serialized as numeric character references in
ordinary content. The exact position and number of splits is
not specified.
false
, unrepresentable
characters in a CDATAsection
are reported as
"wf-invalid-character"
errors if the parameter
"well-formed"
is set to true
. The error is not recoverable -
there is no mechanism for supplying alternative characters and
continuing with the serialization.
DocumentFragment
nodes are serialized by
serializing the children of the document fragment in the order
they appear in the document fragment.
Note:
The serialization of a Node
does not always
generate a well-formed
XML document, i.e. a LSParser
might throw fatal
errors when parsing the resulting serialization.
Within the character data of a document (outside of markup), any characters that cannot be represented directly are replaced with character references. Occurrences of '<' and '&' are replaced by the predefined entities < and &. The other predefined entities (>, ', and ") might not be used, except where needed (e.g. using > in cases such as ']]>'). Any characters that cannot be represented directly in the output character encoding are serialized as numeric character references (and since character encoding standards commonly use hexadecimal representations of characters, using the hexadecimal representation when serializing character references is encouraged).
To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "'", and the double-quote character (") as """. New line characters and other characters that cannot be represented directly in attribute values in the output character encoding are serialized as a numeric character reference.
Within markup, but outside of attributes, any occurrence of a
character that cannot be represented in the output character
encoding is reported as a DOMError
fatal error. An
example would be serializing the element <LaCañada/> with
encoding="us-ascii"
. This will result with a
generation of a DOMError
"wf-invalid-character-in-node-name" (as proposed in "well-formed").
When requested by setting the parameter "normalize-characters"
on LSSerializer
to true, character normalization is
performed according to the definition of fully normalized characters
included in appendix E of [XML 1.1] on all data to be
serialized, both markup and character data. The character
normalization process affects only the data as it is being
written; it does not alter the DOM's view of the document after
serialization has completed.
Implementations are required to support the encodings "UTF-8",
"UTF-16", "UTF-16BE", and "UTF-16LE" to guarantee that data is
serializable in all encodings that are required to be supported by
all XML parsers. When the encoding is UTF-8, whether or not a byte
order mark is serialized, or if the output is big-endian or
little-endian, is implementation dependent. When the encoding is
UTF-16, whether or not the output is big-endian or little-endian
is implementation dependent, but a Byte Order Mark must be
generated for non-character outputs, such as
LSOutput.byteStream
or
LSOutput.systemId
. If the Byte Order Mark is not
generated, a "byte-order-mark-needed" warning is reported. When
the encoding is UTF-16LE or UTF-16BE, the output is big-endian
(UTF-16BE) or little-endian (UTF-16LE) and the Byte Order Mark is
not be generated. In all cases, the encoding declaration, if
generated, will correspond to the encoding used during the
serialization (e.g. encoding="UTF-16"
will appear if
UTF-16 was requested).
Namespaces are fixed up during serialization, the serialization process will verify that namespace declarations, namespace prefixes and the namespace URI associated with elements and attributes are consistent. If inconsistencies are found, the serialized form of the document will be altered to remove them. The method used for doing the namespace fixup while serializing a document is the algorithm defined in Appendix B.1, "Namespace normalization", of [DOM Level 3 Core].
While serializing a document, the parameter "discard-default-content" controls whether or not non-specified data is serialized.
While serializing, errors and warnings are reported to the
application through the error handler
(LSSerializer.domConfig
's "error-handler"
parameter). This specification does in no way try to define all
possible errors and warnings that can occur while serializing a
DOM node, but some common error and warning cases are
defined. The types (DOMError.type
) of errors and
warnings defined by this specification are:
"no-output-specified" [fatal]
LSOutput
if no
output is specified in the LSOutput
.
"unbound-prefix-in-entity-reference" [fatal]
true
and an entity whose
replacement text contains unbound namespace prefixes is
referenced in a location where there are no bindings for
the namespace prefixes.
"unsupported-encoding" [fatal]
In addition to raising the defined errors and warnings, implementations are expected to raise implementation specific errors and warnings for any other error and warning cases such as IO errors (file not found, permission denied,...) and so on.
interface LSSerializer { readonly attribute DOMConfiguration domConfig; attribute DOMString newLine; attribute LSSerializerFilter filter; boolean write(in Node nodeArg, in LSOutput destination) raises(LSException); boolean writeToURI(in Node nodeArg, in DOMString uri) raises(LSException); DOMString writeToString(in Node nodeArg) raises(DOMException, LSException); };
domConfig
of type DOMConfiguration
, readonlyDOMConfiguration
object used by the
LSSerializer
when serializing a DOM node.
DOMConfiguration
objects for
LSSerializer
adds, or modifies, the following
parameters:
"canonical-form"
true
true
will set the parameters "format-pretty-print",
"discard-default-content",
and "xml-declaration",
to false
. Setting one of those
parameters to true
will set this
parameter to false
. Serializing an XML
1.1 document when "canonical-form" is
true
will generate a fatal error.
false
"discard-default-content"
true
Attr.specified
attribute to
decide what attributes should be discarded. Note
that some implementations might use whatever
information available to the implementation
(i.e. XML schema, DTD, the
Attr.specified
attribute, and so on) to
determine what attributes and content to discard if
this parameter is set to true
.
false
"format-pretty-print"
true
false
"ignore-unknown-character-denormalizations"
true
"unknown-character-denormalization"
warning (instead of raising an error, if this
parameter is not set) and ignore any possible
denormalizations caused by these characters.
false
"normalize-characters"
DOMConfiguration
in [DOM Level 3 Core]. Unlike in the Core, the default value for
this parameter is true
. While DOM
implementations are not required to support fully normalizing
the characters in the document according to appendix E of
[XML 1.1], this parameter must be activated by
default if supported.
"xml-declaration"
true
Document
, Element
,
or Entity
node is serialized, the XML
declaration, or text declaration, should be
included. The version
(Document.xmlVersion
if the document
is a Level 3 document and the version is non-null,
otherwise use the value "1.0"), and the output
encoding (see LSSerializer.write
for
details on how to find the output encoding) are
specified in the serialized XML declaration.
false
"xml-declaration-needed"
warning if
this will cause problems (i.e. the serialized data
is of an XML version other than [XML 1.0],
or an encoding would be needed to be able to
re-parse the serialized data).
filter
of type LSSerializerFilter
DOMConfiguration
parameters have been applied. For
example, CDATA sections won't be passed to the filter if
"cdata-sections"
is set to false
.
newLine
of type DOMString
null
will reset its value to the default value.
write
LSSerializer
interface. The
output is written to the supplied LSOutput
.
LSOutput
, the encoding is found
by looking at the encoding information that is reachable through
the LSOutput
and the item to be written (or its
owner document) in this order:
LSOutput.encoding
,
Document.inputEncoding
,
Document.xmlEncoding
.
LSOutput
, a
"no-output-specified" fatal error is raised.
nodeArg
of type
Node
destination
of type
LSOutput
|
Returns |
SERIALIZE_ERR: Raised if the |
writeToString
LSSerializer
interface. The
output is written to a DOMString
that is returned
to the caller. The encoding used is the encoding of the
DOMString
type, i.e. UTF-16. Note that no Byte
Order Mark is generated in a DOMString
object.
nodeArg
of type
Node
|
Returns the serialized data. |
|
DOMSTRING_SIZE_ERR: Raised if the resulting string is too long to
fit in a |
SERIALIZE_ERR: Raised if the |
writeToURI
LSSerializer.write
was called with a
LSOutput
with no encoding specified and
LSOutput.systemId
set to the uri
argument.
nodeArg
of type
Node
uri
of type
DOMString
|
Returns |
SERIALIZE_ERR: Raised if the |
This interface represents an output destination for data.
This interface allows an application to encapsulate information about an output destination in a single object, which may include a URI, a byte stream (possibly with a specified encoding), a base URI, and/or a character stream.
The exact definitions of a byte stream and a character stream are binding dependent.
The application is expected to provide objects that implement
this interface whenever such objects are needed. The application
can either provide its own objects that implement this
interface, or it can use the generic factory method
DOMImplementationLS.createLSOutput()
to create
objects that implement this interface.
The LSSerializer
will use the
LSOutput
object to determine where to serialize
the output to. The LSSerializer
will look at the
different outputs specified in the LSOutput
in the
following order to know which one to output to, the first one
that is not null and not an empty string will be used:
LSOutput
objects belong to the application. The
DOM implementation will never modify them (though it may make
copies and modify the copies, if necessary).
interface LSOutput { // Depending on the language binding in use, // this attribute may not be available. attribute LSWriter characterStream; attribute LSOutputStream byteStream; attribute DOMString systemId; attribute DOMString encoding; };
byteStream
of type LSOutputStream
characterStream
of type LSWriter
encoding
of type DOMString
systemId
of type DOMString
LSSerializerFilter
s provide applications the
ability to examine nodes as they are being serialized and decide
what nodes should be serialized or not. The
LSSerializerFilter
interface is based on the
NodeFilter
interface defined in [DOM Level 2 Traversal and Range].
Document
, DocumentType
,
DocumentFragment
, Notation
,
Entity
, and children of Attr
nodes are
not passed to the filter. The child nodes of an
EntityReference
node are only passed to the filter if
the EntityReference
node is skipped by the method
LSParserFilter.acceptNode()
.
When serializing an Element
, the element is passed
to the filter before any of its attributes are passed to the
filter. Namespace declaration attributes, and default attributes
(except in the case when "discard-default-content"
is set to false
), are never passed to the filter.
The result of any attempt to modify a node passed to a
LSSerializerFilter
is implementation dependent.
DOM applications must not raise exceptions in a filter. The effect of throwing exceptions from a filter is DOM implementation dependent.
For efficiency, a node passed to the filter may not be the same as the one that is actually in the tree. And the actual node (node object identity) may be reused during the process of filtering and serializing a document.
interface LSSerializerFilter : traversal::NodeFilter { readonly attribute unsigned long whatToShow; };
whatToShow
of type unsigned long
, readonlyLSSerializer
what types of nodes to show
to the filter. If a node is not shown to the filter using this
attribute, it is automatically serialized. See
NodeFilter
for definition of the constants. The
constants SHOW_DOCUMENT
,
SHOW_DOCUMENT_TYPE
,
SHOW_DOCUMENT_FRAGMENT
, SHOW_NOTATION
,
and SHOW_ENTITY
are meaningless here, such nodes
will never be passed to a LSSerializerFilter
.
SHOW_ATTRIBUTE
constant indicates that the
Attr
nodes are shown and passed to the filter.