Parses the XML information and builds up the DOM tree in memory
providing a Tcl object command to this DOM document object.
Example:
dom parse $xml doc
$doc documentElement root
parses the XML in the variable xml, creates the DOM tree in
memory, make a reference to the document object, visible in Tcl as
a document object command, and assigns this new object name to the
variable doc. When doc gets freed, the DOM tree and the associated
Tcl command object (document and all node objects) are freed
automatically.
set document [dom parse $xml]
set root [$document documentElement]
parses the XML in the variable xml, creates the DOM tree in
memory, make a reference to the document object, visible in Tcl as
a document object command, and returns this new object name, which
is then stored in document. To free the underlying
DOM tree and the associative Tcl object commands (document + nodes
+ fragment nodes) the document object command has to be explicitly
deleted by:
$document delete
or
rename $document ""
The valid options are:
- -simple
- If -simple is specified, a simple but fast
parser is used (conforms not fully to XML recommendation). That
should double parsing and DOM generation speed. The encoding of the
data is not transformed inside the parser. The simple parser does
not respect any encoding information in the XML declaration. It
skips over the internal DTD subset and ignores any information in
it. Therefore it doesn't include defaulted attribute values into
the tree, even if the according attribute declaration is in the
internal subset. It also doesn't expand internal or external entity
references other than the predefined entities and character
references.
- -html
- If -html is specified, a fast HTML parser is
used, which tries to even parse badly formed HTML into a DOM
tree.
- -html5
- This option is only available if tDOM was build with
--enable-html5. Try the featureinfo method if you
need to know if this feature is build in. If -html5 is specified, the gumbo lib html5 parser
(https://github.com/google/gumbo-parser) is used to build the DOM
tree. This is, as far as it goes, XML namespace-aware. Since this
probably isn't wanted by a lot of users and adds only burden for no
good in a lot of use cases -html5 can be combined
with -ignorexmlns, in which case all nodes and
attributes in the DOM tree are not in an XML namespace. All tag and
attribute names in the DOM tree will be lower case, even for
foreign elements not in the xhtml, svg or mathml namespace. The DOM
tree may include nodes, that the parser inserted because they are
implied by the context (as <head>, <tbody>, etc.).
- -json
- If -json is specified, the data is expected to be a valid JSON string (according to
RFC 7159). The command returns an ordinary DOM document with
nesting token inside the JSON data translated into tree hierarchy.
If a JSON array value is itself an object or array then container
element nodes named (in a default build) arraycontainer or
objectcontainer, respectively, are inserted into the tree. The JSON
serialization of this document (with the domDoc method asJSON) is the same JSON information as the data, preserving JSON datatypes, allowing non-unique member
names of objects while preserving their order and the full range of
JSON string values. JSON datatype handling is done with an
additional property "sticking" at the doc and tree nodes. This
property isn't contained in an XML serialization of the document.
If you need to store the JSON data represented by a document, store
the JSON serialization and parse it back from there. Apart from
this JSON type information the returned doc command or handle is an
ordinary DOM doc, which may be investigated or modified with the
full range of the doc and node methods. Please note that the
element node names and the text node values within the tree may be
outside of what the appropriate XML productions allow.
- -jsonroot <document element name>
- If given makes the given element name the document element of
the resulting doc. The parsed content of the JSON string will be
the childs of this document element node.
- -jsonmaxnesting integer
- This option only has effect if used together with the -json option. The current implementation uses a recursive
descent JSON parser. In order to avoid using excess stack space,
any JSON input that has more than a certain levels of nesting is
considered invalid. The default maximum nesting is 2000. The option
-jsonmaxnesting allows the user to adjust that.
- --
- The option -- marks the end of options. While
respected in general this option is only needed in case of parsing
JSON data, which may start with a "-".
- -keepEmpties
- If -keepEmpties is specified then text nodes
which contain only whitespaces will be part of the resulting DOM
tree. In default case (-keepEmpties not given)
those empty text nodes are removed at parsing time.
- -keepCDATA
- If -keepCDATA is specified then CDATA sections
aren't added to the tree as text nodes (and, if necessary, combined
with sibling text nodes into one text node) as without this option
but are added as CDATA_SECTION_NODEs to the tree. Please note that
the resulting tree isn't prepared for XPath selects or to be the
source or the stylesheet of an XSLT transformation. If not combined
with -keepEmpties only not whitespace only CDATA
sections will be added to the resulting DOM tree.
- -channel <channel-ID>
- If -channel <channel-ID> is specified,
the input to be parsed is read from the specified channel. The
encoding setting of the channel (via fconfigure -encoding) is
respected, ie the data read from the channel are converted to UTF-8
according to the encoding settings before the data is parsed.
- -baseurl <baseURI>
- If -baseurl <baseURI> is specified, the
baseURI is used as the base URI of the document. External entities
references in the document are resolved relative to this base URI.
This base URI is also stored within the DOM tree.
- -feedbackAfter <#bytes>
- If -feedbackAfter <#bytes> is specified,
the tcl command given by -feedbackcmd is evaluated
at the first element start within the document (or an external
entity) after the start of the document or external entity or the
last such call after #bytes. For backward compatibility if no
-feedbackcmd is given but there is a tcl proc named
::dom::domParseFeedback this proc is used as -feedbackcmd. If there
isn't such a proc and -feedbackAfter is used it is an error to not
also use -feedbackcmd. If the called script raises error, then
parsing will be aborted, the dom parse call
returns error, with the script error msg as error msg. If the
called script return -code break, the parsing will
abort and the dom parse call will return the empty
string.
- -feedbackcmd <script>
- If -feedbackcmd <script> is specified,
the script script is evaluated at the first
element start within the document (or an external entity) after the
start of the document or external entity or the last such call
after #bytes value given by the -feedbackAfter
option. If -feedbackAfter isn't given, using this
option doesn't has any effect. If the called script raises error,
then parsing will be aborted, the dom parse call
returns error, with the script error msg as error msg. If the
called script return -code break, the parsing will
abort and the dom parse call will return the empty
string.
- -externalentitycommand <script>
- If -externalentitycommand <script> is
specified, the specified tcl script is called to resolve any
external entities of the document. The actual evaluated command
consists of this option followed by three arguments: the base uri,
the system identifier of the entity and the public identifier of
the entity. The base uri and the public identifier may be the empty
list. The script has to return a tcl list consisting of three
elements. The first element of this list signals how the external
entity is returned to the processor. Currently the two allowed
types are "string" and "channel". The second element of the list
has to be the (absolute) base URI of the external entity to be
parsed. The third element of the list are data, either the already
read data out of the external entity as string in the case of type
"string", or the name of a tcl channel, in the case of type
"channel". Note that if the script returns a tcl channel, it will
not be closed by the processor. It must be closed separately if it
is no longer needed.
- -useForeignDTD <boolean>
- If <boolean> is true and the document does not have an
external subset, the parser will call the -externalentitycommand
script with empty values for the systemId and publicID arguments.
Please note that if the document also doesn't have an internal
subset, the -startdoctypedeclcommand and -enddoctypedeclcommand
scripts, if set, are not called. The -useForeignDTD respects
- -paramentityparsing
<always|never|notstandalone>
- The -paramentityparsing option controls, if
the parser tries to resolve the external entities (including the
external DTD subset) of the document while building the DOM tree.
-paramentityparsing requires an argument, which
must be either "always", "never", or "notstandalone". The value
"always" means that the parser tries to resolves (recursively) all
external entities of the XML source. This is the default in case
-paramentityparsing is omitted. The value "never"
means that only the given XML source is parsed and no external
entity (including the external subset) will be resolved and parsed.
The value "notstandalone" means, that all external entities will be
resolved and parsed, with the exception of documents, which
explicitly states standalone="yes" in their XML declaration.
- -ignorexmlns
- It is recommended, that you only use this option with the
-html5 option. If this option is given, no node
within the created DOM tree will be internally marked as placed
into an XML Namespace, even if there is a default namespace in
scope for un-prefixed elements or even if the element has a defined
namespace prefix. One consequence is that XPath node expressions on
such a DOM tree doesn't work as may be expected. Prefixed element
nodes can't be selected naively and element nodes without prefix
will be seen by XPath expressions as if they are not in any
namespace (no matter if they are in fact should be in a default
namespace). If you need to inject prefixed node names into an XPath
expression use the '%' syntax described in the documentation of the
of the domNode command method >selectNodes.
This method creates Tcl commands, which in turn create tDOM
nodes. Tcl commands created by this command are only available
inside a script given to the domNode methods appendFromScript or insertBeforeFromScript. If a command created with createNodeCmd is invoked in any other context, it will
return error. The created command commandName
replaces any existing command or procedure with that name. If the
commandName includes any namespace qualifiers, it
is created in the specified namespace. The -tagName option is only allowed for the elementNode type.
The -jsonType option is only allowed for
elementNode and textNode types.
If such command is invoked inside a script given as argument to
the domNode method appendFromScript or insertBeforeFromScript it creates a new node and appends
this node at the end of the child list of the invoking element
node. If the option -returnNodeCmd was given, the
command returns the created node as Tcl command. If this option was
omitted, the command returns nothing. Each command creates always
the same type of node. Which type of node is created by the command
is determined by the first argument to the createNodeCmd. The syntax of the created command depends on
the type of the node it creates.
If the command type to create is elementNode,
the created command will create an element node, if called. Without
the -tagName option the tag name of the created
node is commandName without namespace qualifiers.
If the -tagName option was given then the created
command the created elements will have this tag name. If the
-jsonType option was given then the created node
elements will have the given JSON type. If the -namespace option is given the created element node will be
XML namespaced and in the namespace given by the option. The
element name will be literal as given either by the command name or
the -tagname option, if that was given. An
appropriate XML namespace declaration will be automatically added,
to bind the prefix (if the element name has one) or the default
namespace (if the element name hasn't a prefix) to the namespace if
such a binding isn't in scope.
The syntax of the created command is:
elementNodeCmd ?attributeName attributeValue ...? ?script?
elementNodeCmd ?-attributeName attributeValue ...? ?script?
elementNodeCmd name_value_list script
The command syntax allows three different ways to specify the
attributes of the resulting element. These could be specified with
attributeName attributeValue argument pairs, in an
"option style" way with -attriubteName
attributeValue argument pairs (the '-' character is only
syntactical sugar and will be stripped off) or as a Tcl list with
elements interpreted as attribute name and the corresponding
attribute value. The attribute name elements in the list may have a
leading '-' character, which will be stripped off.
Every elementNodeCmd accepts an optional Tcl
script as last argument. This script is evaluated as recursive
appendFromScript script with the node created by
the elementNodeCmd as parent of all nodes created
by the script.
If the first argument of the method is textNode, the command will create a text node. If the
-jsonType option was given then the created text
node will have that JSON type. The syntax of the created command
is:
textNodeCmd ?-disableOutputEscaping? data
If the optional flag -disableOutputEscaping is
given, the escaping of the ampersand character (&) and the left
angle bracket (<) inside the data is disabled. You should use
this flag carefully.
If the first argument of the method is commentNode or cdataNode the command will
create an comment node or CDATA section node. The syntax of the
created command is:
nodeCmd data
If the first argument of the method is piNode,
the command will create a processing instruction node. The syntax
of the created command is:
piNodeCmd target data