tdom::schema -
Creates a schema validation command
package require tdom
tdom::schema ?create? cmdName
Every call of this command creates a new validation command. A
validation command has methods to define a schema and is able to
validate XML data or to post-validate a tDOM DOM tree (and to some
degree other kind of hierarchical data) against this schema.
Also, a validation command may be used as argument to the
-validateCmd option of the dom
parse and the expat commands to enable
validation additionally to what they do otherwise.
The methods of created commands are:
- prefixns ?prefixUriList?
- This method controls prefix (or abbreviation) to namespace URI
mapping. Wherever a namespace argument is expected in the schema
command methods the "prefix" could be used instead of the namespace
URI. If the list maps the same prefix to different namespace URIs,
the first one wins. If there is no such prefix, the namespace
argument is used literally as namespace URI. If the method is
called without argument, it returns the current prefixUriList. If
the method is called with the empty string, any namespace URI
arguments are used literally. This is the default.
- defelement name
?namespace? <definition
script>
- This method defines the element name (optional
in the namespace namespace) in the schema. The
definition script is evaluated and defines the
content model of the element. If the namespace
argument is given, any element or ref references in the definition script not wrapped inside
a namespace command are resolved in that
namespace. If there is already a element definition for the
name/namespace combination, the command raises error.
- defelementtype typename
name ?namespace? <definition script>
- This method defines the element type typename
(optional in the namespace namespace) in the
schema. If the element type is used in a definition script with the
schema command elementtype, the validation engine expects an
element named name (in the namespace namespace, if given) and the content model definition script. Defining element types seems only
sensible if you really have elements with the same name and
namespace but different content models. The definition
script is evaluated and defines the content model of the
element. If the namespace argument is given, any
element or ref references in the
definition script not wrapped inside a namespace
command are resolved in that namespace. If there is already an
elementtype definition for the name/namespace combination, the
command raises error. The document element of any XML to validate
cannot be a defelementtype defined element.
- defpattern name
?namespace? <definition
script>
- This method defines a (maybe complex) content particle with the
name (optional in the namespace namespace) in the schema, to be used in other definition
scripts with the definition command ref. The
definition script is evaluated and defines the
content model of the content particle. If the namespace argument is given, any element
or ref references in the definition script not
wrapped inside a namespace command are resolved in
that namespace. If there is already a pattern definition for the
name/namespace combination, the command raises error.
- deftexttype name
<constraint script>
- This method defines a bundle of text constraints that can be
referred to by name while defining constraints on
text element or attribute values. If there is already a text type
definition with this name, the command raises error. A text type
must be defined before it can be used in schema definition
scripts.
- start documentElement
?namespace?
- This method defines the name and namespace of the root element
of a tree to validate. If this method is used, the root element
must match for validity. If start is not used, any
element defined by defelement may be the root of a
valid document. The start method may be used
several times with varying arguments during the lifetime of a
validation command. If the command is called with just the empty
string (and no namespace argument), the validation constraint for
the root element is removed and any defined element will be valid
as root of a tree to validate.
- define <definition
script>
- This method allows to define several elements or patterns or a
whole schema with one call. All schema command methods so far
(prefixns, defelement, defelementtype, defpattern, deftexttype and start) are allowed top
level in the definition script. The define method itself isn't allowed recursively.
- event (start|end|text)
?event specific data?
- This method allows the validation of hierarchical data against
the content constraints of the validation command.
- start name ?attributes? ?namespace?
- Checks if the current validation state allows the element
name in the namespace to start
here. It raises error if not.
- end
- Checks if the current innermost open element may end there in
the current state without violation of validation constraints. It
raises error if not.
- text text
- Checks if the current validation state allows the given text
content. It raises error if not.
- validate <XML
string> ?objVar?
- Returns true if the <XML string> is
valid, or false, otherwise. If validation has failed and the
optional objVar argument is given, the variable
with that name is set to a validation error message. If the XML
string is valid and the optional objVar argument
is given, the variable with that name is set to the empty
string.
- validatefile filename
?objVar?
- Returns true if the content of filename is
valid, or false, otherwise. The given file is feeded as binary
stream to expat, therefore only US-ASCII, ISO-8859-1, UTF-8 or
UTF-16 encoded data will work with this method. If validation has
failed and the optional objVar argument is given,
the variable with that name is set to a validation error message.
If the XML string is valid and the optional objVar
argument is given, the variable with that name is set to the empty
string.
- validatechannel channel
?objVar?
- Returns true if the content read from the Tcl channel channel is valid, or false, otherwise. Since data read out
of a Tcl channel is UTF-8 encoded, any misleading encoding
declaration at the beginning of the data will lead to errors. If
the validation fails and the optional objVar
argument is given, the variable with that name is set to a
validation error message. If the XML string is valid and the
optional objVar argument is given, the variable
with that name is set to the empty string.
- domvalidate domNode
?objVar?
- Returns true if the first argument is a valid tree, or false,
otherwise. If validation has failed and the optional objVar argument is given, the variable with that name is
set to a validation error message. If the dom tree is valid and the
optional objVar argument is given, the variable
with that name is set to the empty string.
- reportcmd ?cmd?
- This method expects the name of a Tcl command to be called in
case of validation error. The command will be called with two
arguments appended: the schema command which raises the validation
error, and a validation error code.
The possible error codes are:
- MISSING_ELEMENT
- MISSING_TEXT
- UNEXPECTED_ELEMENT
- UNEXPECTED_ROOT_ELEMENT
- UNEXPECTED_TEXT
- UNKNOWN_ROOT_ELEMENT
- UNKNOWN_ATTRIBUTE
- MISSING_ATTRIBUTE
- INVALID_ATTRIBUTE_VALUE
- DOM_KEYCONSTRAINT
- DOM_XPATH_BOOLEAN
- INVALID_KEYREF
- INVALID_VALUE
- UNKOWN_GLOBAL_ID
- UNKOWN_ID
For more detailed information see section Recovering.
- delete
- This method deletes the validation command.
- info ?args?
- This method bundles methods to query the state of and details
about the schema command.
- validationstate
- This method returns the state of the validation command with
respect to validation state. The possible return values and their
meanings are:
- READY
- The validation command is ready to start validation
- VALIDATING
- The validation command is in the process of validating
input.
- FINISHED
- The validation has finished, no further events are
expected.
- vstate
- This method is a shorter alias for validationstate; see
there.
- line
- If the schema command is currently validating, this method
returns the line part of the parsing position information, and the
empty string in all other cases. If the schema command is currently
post-validating a DOM tree, there may be no position information
stored at some or all nodes. The empty string is returned in these
cases.
- column
- If the schema command is currently validating this method
returns the column part of the parsing position information, and
the empty string in all other cases. If the schema command is
currently post-validating a DOM tree, there may be no position
information stored at some or all nodes. The empty string is
returned in these cases.
- domNode
- If the schema command isn't currently post-validating a DOM
tree this method returns the empty string. Otherwise, if the schema
command waits for the reportcmd script to finish while recovering
from a validation error it returns the node on which the validation
engine is currently looking at in case the node is an ELEMENT_NODE
or, if not, its parent node. It is recommended that you do not use
this method. Or at least leave the DOM tree alone, use it
read-only.
- nrForwardDefinitions
- Returns how many elements, element types and ref patterns are
referenced that aren't defined so far (summed together).
- definedElements
- Returns in no particular order the defined elements in the
grammar as list. If an element is namespaced, its list entry will
be itself a list with two elements, with the name as first and the
namespace as second element.
- definedElementtypes
- Returns in no particular order the defined element types in the
grammar as list. If an element type is namespaced, its list entry
will be itself a list with two elements, with the name as first and
the namespace as second element.
- definedPatterns
- Returns in no particular order the defined named pattern in the
grammar as list. If a named pattern is namespaced, its list entry
will be itself a list with two elements, with the name as first and
the namespace as second element.
- expected
- Returns in no particular order all possible next events (since
the last successful event match, if there was one) as a list. If an
element is namespaced its list entry will be itself a list with two
elements, with the name as first and the namespace as second
element. If text is a possible next event, the list entry will be a
two elements list, with #text as first element and the empty string
as second. If an any element constraint is possible. the list entry
will be a two elements list, with <any> as first element and
the empty string as second. If an any element in a certain
namespace constraint is possible, the list entry will be a two
elements list, with <any> as first element and the namespace
as second. If element end is a possible event, the list entry will
be a two elements list with <elementend> as first element and
the empty string as second element.
- definition name ?namespace?
- Returns the code that defines the given element. The command
raises error if there is no definition of that element.
- typedefinition name ?namespace?
- Returns the code that defines the given element type
definition. The command raises error if there is no definition of
that element.
- patterndefinition name ?namespace?
- Returns the code that defines the given pattern definition. The
command raises error if there is no definition of a pattern with
that name and, if given, namespace.
- vaction ?name|namespace|text?
-
This method returns useful information only if the schema
command waits for the reportcmd script to finish while recovering
from a validation error. Otherwise it returns NONE.
If the command is called without the optional argument the
possible return values and their meanings are:
- NONE
- The schema command currently does not recover from a validation
event.
- MATCH_ELEMENT_START
- Element start event, which includes looking for missing or
unknown attributes.
- MATCH_ELEMENT_END
- Element end event.
- MATCH_TEXT
- Validating text between tags.
- MATCH_ATTRIBUTE_TEXT
- Attribute text value constraint check
- MATCH_GLOBAL
- Checking global IDs
- MATCH_DOM_KEYCONSTRAINT
- Checking domunique constraint
- MATCH_DOM_XPATH_BOOLEAN
- Checking domxpathboolean constant
If called with one of the possible optional arguments, the
command returns detail information depending on current action.
- name
- Returns the name of the element that has to match in case of
MATCH_ELEMENT_START. Returns the name of the closed element in case
of MATCH_ELEMENT_END. Returns the name of the attribute in case of
MATCH_ATTRIBUTE_TEXT. Returns the name of the parent element in
case of MATCH_TEXT.
- namespace
- Returns the namespace of the element that has to match in case
of MATCH_ELEMENT_START. Returns the namespace of the closed element
in case of MATCH_ELEMENT_END. Returns the namespace of the
attribute in case of MATCH_ATTRIBUTE_TEXT. Returns the namespace of
the parent element in case of MATCH_TEXT.
- text
- Returns the text to match in case of MATCH_TEXT. Returns the
value of the attribute in case of MATCH_ATTRIBUTE_TEXT.
- stack top|inside|associated
- In Tcl scripts evaluated by validation this method provides
information about the current validation stack. Called outside this
context the method returns the empty string.
- top
- Returns the element whose content is currently checked (the
open element tag at this moment).
- inside
- Returns all currently open elements as a list.
- associated
- Returns the data associated with the current top most stack
content particle or the empty string if there isn't any.
- reset
- This method resets the validation command into state READY
(while preserving the defined grammar).
Schema definition scripts are ordinary Tcl scripts evaluated in
the namespace tdom::schema. The schema definition commands listed
below in this Tcl namespace allow the definition of a wide variety
of document structures. Every schema definition command establishes
a validation constraint on the content which has to match or must
be optional to qualify the content as valid. It is a validation
error if there is additional (not matched) content.
White-space-only text (in the XML sense of white space) between any
different tags is ignored, with the exception of text only elements
(for which even white-space-only text will be considered as
significant content).
The schema definition commands are:
- element name ?quant? ?<definition script>?
- If the optional argument definition script is
not given this command refers to the element defined with defelement with the name name in the
current context namespace. If the defelement
script argument is given, the validation constraint expects an
element with the name name in the current
namespace with content "locally" defined by the definition script. Forward references to so far not defined
elements or patterns or other local definitions of the same name
inside the definition script are allowed. If a
forward referenced element is not defined until validation, only an
empty element with name name and namespace
namespace and no attributes matches.
- elementtype name
?quant?
- This command refers to the element defined with defelementtype with the type name name in
the current context namespace. Forward references to so far not
defined element types or recursive references are allowed. If a
forward referenced element type is not defined until validation any
empty element without attributes will be accepted.
- ref name ?quant?
- This command refers to the content particle defined with
defpattern with the name name in
the current context namespace. Forward references to a so far not
defined pattern and recursive references are allowed. If a forward
referenced pattern is not defined until validation no content
whatsoever is expected ("empty match").
- group ?quant? <definition script>
- This method allows to group a sequence of content particles
defined by the definition script>, which have
to match in this sequence order.
- choice ?quant? <definition script>
- This schema constraint matches if one of the top level content
particles defined by the definition script>
matches. If one of this top level content particle is optional this
constraint matches the "empty match".
- interleave ?quant?
<definition script>
- This schema constraint matches after every of the required top
level content particles defined by the definition
script> have matched (and, optional, some or all other) in
any arbitrary order.
- mixed ?quant? <definition script>
- This schema constraint matches for any text (including the
empty one) and every top level content particle defined by the
definition script> with default quantifier
*.
- text ?<constraint
script>|"type" typename?
- Without the optional constraint script this validation
constraint matches every string (including the empty one). With
constraint script or with a given text type
argument a text matching this script or the text type is
expected.
- any ?namespace?
?quant?
- The any command matches every element (in the namespace
namespace, if that is given) (with whatever
attributes) or subtree, no matter if known within the schema or
not. Please note that in case of no namespace
argument is given that means that the quantifier * and + will eat
up any elements until the enclosing element ends. If you really
have a namespace that looks like a valid tDOM schema quantifier you
will have to spell out always both arguments.
- attribute name ?quant? (?<constraint script>|"type"
typename?)
- The attribute command defines an attribute (in no namespace) to
the enclosing element. The first definition of name inside an element definition wins; later definitions
of the same name are silently ignored. After the name argument there may be one of the quantifiers ? or !.
If there is, it will be used. Otherwise the attribute will be
required (must be present in the XML source). If there is one
argument more this argument is evaluated as constraint script,
defining the value constraints of the attribute. Otherwise, if
there are two more arguments and the first of them is the bare-word
"type" the following argument is used as a text type name. This
command is only allowed at top level in the definition script of an
defelement/element script.
- nsattribute name
namespace ?quant? (?<constraint script>|"type" typename?)
- This command does the same as the command attribute, for the attribute name in the
namespace namespace.
- namespace URI <definition script>
- Evaluates the definition script with context
namespace URI. Every element, element type or ref
command name will be looked up in the namespace URI, and local defined elements will be in that namespace.
An empty string as URI means no namespace.
- tcl tclcmd ?arg arg ...?
- Evaluates the Tcl script tclcmd arg arg ... .
This validation command is only allowed in strict sequential
context (not in choice, mixed and interleave). If the return code
is something else than TCL_OK, this is an error (which is not
catched and reported by reportcmd).
- self
- Returns the schema command.
- associate data
- This command is only allowed top-level inside definition
scripts of the element, elementtype, pattern or interleave content
particles. Associates the data given as argument
with the currently defined content particle and may be requested in
scripts evaluated while validating the content of that particle
with the schema command method call info stack
associated.
- domunique selector
fieldlist ?name? ?"IGNORE_EMPTY_FIELD_SET"|("EMPTY_FIELD_SET_VALUE"
emptyFieldSetValue)?
- If not postvalidating a DOM tree with domvalidate this constraint always matches. If
postvalidating this constraint resembles the xsd key/keyref
mechanism. The selector argument may be any valid
XPath expression (without the xsd limits). Several domunique commands within one element definition are
allowed. They are checked in definition order. The argument name is
available in the recovering script per info vaction
name. If the fieldlist does not select
something for a node of the result set of the selector the key value will be the empty string by default.
If the arguments EMPTY_FIELD_SET_VALUE
<value> are given an empty node set will have the key
value value. If instead the flag IGNORE_EMPTY_FIELD_SET flag is given an empty node set
result will not have any key value.
- domxpathboolean XPath_expr ?name?
-
If not postvalidating a DOM tree with domvalidate this constraint always matches. If
postvalidating the XPath_expr argument is
evaluated (with the node matching the schema parent of the
domxpathboolean command as context node). The
constraint maches if the result of this XPath expression, converted
to boolean by XPath rules, is true. Several domxpathboolean commands within one element definition are
allowed. They are checked in definition order.
This enables checks depending on more than one element.
Consider
tdom::schema s
s define {
defelement doc {
element a ! text
element b ! text
element c ! text
domxpathboolean "a * b * c >= 20000" volume
domxpathboolean "a > b and b > c" sequence
}
}
- prefixns ?prefixUriList?
- This defines a prefix to namespace URI mapping exactly as a
schemacmd prefixns would. It is meant as top-level command
of a schemacmd define script. This command is not allowed
nested in another definition script command and will raise error,
if you call it there.
- defelement name
?namespace? <definition
script>
- This defines an element exactly as a schemacmd
defelement call would. It is meant as top-level command of a
schemacmd define script. This command is not allowed nested
in another definition script command and will raise error, if you
call it there.
- defelementtype name
?namespace? <definition
script>
- This defines an elementtype exactly as a schemacmd
defelementtype call would. It is meant as top-level command of
a schemacmd define script. This command is not allowed
nested in another definition script command and will raise error,
if you call it there.
- defpattern name
?namespace? <definition
script>
- This defines a named pattern exactly as a schemacmd
defpattern call would. It is meant as top-level command of a
schemacmd define script. This command is not allowed nested
in another definition script command and will raise error, if you
call it there.
- deftexttype name
<constraint script>
- This defines a named bundle of text constraints exactly as a
schemacmd deftexttype call would. It is meant as top-level
command of a schemacmd define script. This command is not
allowed nested in another definition script command and will raise
error, if you call it there.
- start name ?namespace?
- This command works exactly as a schemacmd start call
would. It is meant as top-level command of a schemacmd
define script. This command is not allowed nested in another
definition script command and will raise error, if you call it
there.
Several schema definition commands expect a quantifier as one of
their arguments which determines how often the content particle
specified by the command is expected. The valid values for a
quant argument are:
- !
- The content particle has to occur exactly once in valid
documents.
- ?
- The content particle may not occur more than once in valid
documents - the particle is optional.
- *
- The content particle may occur zero or more times in a row in
valid documents.
- +
- The content particle may occur one or more times in a row in
valid documents.
- n
- The content particle must occur n times in a row in valid
documents. The quantifier must be an integer greater zero.
- {n m}
- The content particle must occur at least n and at most m times
in a row in valid documents. The quantifier must be a Tcl list with
two elements. Both elements must be integers, with n >= 0 and n
< m.
If an optional quantifier is not given, it defaults to * in case
of the mixed command and to ! for all other
commands.
Text (parsed character data, as XML calls it) sometimes has to
be of a certain kind or comply with certain rules to be valid. The
text constraint script arguments to text, attribute, nsattribute
and deftexttype commands are evaluated in the Tcl namespace
tdom::schema::text namespace and allow the ensuing
text constraint commands to check text for certain properties. The
commands are defined in the Tcl namespace tdom::schema::text. They raise error in case they are
called outside of a text constraint script.
A few of the ensuing text type commands are exposed as general
Tcl commands. They are defined in the namespace tdom::type and are
called as documented below with the text to check appended to the
argument list. They return a logical value. Please note that the
commands may not accept starting or ending white space. If a
command is available in the tdom::type namespace is recorded in its
documentation.
The tcl text constraint command dispatches the
check to an arbitrary Tcl command, thus enable any programmable
decision rules.
- tcl tclcmd ?arg arg ...?
- Evaluates the Tcl script tclcmd arg arg ...
and the text to validate appended to the argument list. The return
value of the Tcl command is interpreted as a boolean.
- name
- This text constraint matches if the text value matches the XML
name production https://www.w3.org/TR/xml/#NT-Name.
This means that the text value must start with a letter, underscore
(_), or colon (:), and may contain only letters, digits,
underscores (_), colons (:), hyphens (-), and periods (.).
- ncname
- This text constraint matches if the text value matches the XML
ncname production https://www.w3.org/TR/xml-names/#NT-NCName.
This means that the text value must start with a letter or
underscore (_), and may contain only letters, digits, underscores
(_), hyphens (-), and periods (.) (The only difference to the name
constraint is that colons are not permitted.)
- qname
- This text constraint matches if the text value matches the XML
qname production https://www.w3.org/TR/xml-names/#NT-QName.
This means that the text value is either a ncname or two ncnames
joined by a colon (:).
- nmtoken
- This text constraint matches if the text value matches the XML
nmtoken production https://www.w3.org/TR/xml/#NT-Nmtoken
- nmtokens
- This text constraint matches if the text value matches the XML
nmtokens production https://www.w3.org/TR/xml/#NT-Nmtokens
- integer ?(xsd|tcl)?
- This text constraint matches if the text value could be parsed
as an integer. If the optional argument to the command is tcl, everything that returns TCL_OK if feeded into
Tcl_GetInt() matches. If the optional argument to the command is
xsd, the constraint matches if the value is a
valid xsd:integer. Without argument xsd is the
default.
- negativeInteger ?(xsd|tcl)?
- This text constraint matches the same text values as the
integer text constraint (see there), with the
additional constraint, that the value must be < zero.
- nonNegativeInteger ?(xsd|tcl)?
- This text constraint matches the same text values as the
integer text constraint (see there), with the
additional constraint, that the value must be >= zero.
- nonPositiveInteger ?(xsd|tcl)?
- This text constraint matches the same text values as the
integer text constraint (see there), with the
additional constraint, that the value must be <= zero.
- positiveInteger ?(xsd|tcl)?
- This text constraint matches the same text values as the
integer text constraint (see there), with the
additional constraint, that the value must be > zero.
- number ?(xsd|tcl)?
- This text constraint matches if the text value could be parsed
as a number. If the optional argument to the command is tcl, everything that returns TCL_OK if feeded into
Tcl_GetDouble() matches. If the optional argument to the command is
xsd, the constraint matches if the value is a
valid xsd:decimal. Without argument xsd is the
default.
- boolean ?(xsd|tcl)?
- This text constraint matches if the text value could be parsed
as a boolean. If the optional argument to the command is tcl, everything that returns TCL_OK if feeded into
Tcl_GetBoolean() matches. If the optional argument to the command
is xsd, the constraint matches if the value is a
valid xsd:boolean. Without argument xsd is the
default.
- date
- This text constraint matches if the text value is a xsd:date,
which is basically like an ISO 8601 date of the form YYYY-MM-DD,
with optional time zone part (either the letter Z or plus (+) or
minus (-) followed by hh:mm and with maximum allowed positive or
negative time zone 14:00). It follows the date rules of the
Gregorian calendar for all dates. A preceding minus sign for bce
dates is allowed. There is no year 0. The year may have more than 4
digits, but only if needed (no extra leading zeros). This is
available as common Tcl command tdom::type::date.
- time
- This text constraint matches if the text value is a xsd:time,
which is basically like an ISO 8601 time of the form hh:mm:ss with
optional time zone part. The time zone part follow the rules of the
date command; see there. All three parts of the
time value (hours, minutes, seconds) must be spelled out with 2
digits. Additional fractional seconds (with a point ('.') as
separator) are allowed, but not just a dangling point. The time
value 24:00:00 (without fractional part) is allowed. This is
available as common Tcl command tdom::type::time.
- dateTime
- This text constraint matches if the text value is a
xsd:dateTime, which is basically like an ISO 8601 date time of the
form YYYY-MM-DDThh:mm:ss with optional time zone part. The date and
time zone parts follows the rules of the date and
time command; see there. The time part (including
the signaling 'T' character) is mandatory. This is available as
common Tcl command tdom::type::dateTime.
- duration
- This text constraint matches if the text value is a
xsd:duration, which is basically like an ISO 8601 duration of the
form PnYnMnDTnHnMnS. All parts other than the starting P and - if
one of H, M or S is given - T are optional. In case the following
sign letter is S, n may be a decimal (with at least one digit
before and after the dot), otherwise it must be a (positive)
integer. This is available as common Tcl command
tdom::type::duration.
- base64
- This text constraint matches if text is valid according to RFC
4648.
- hexBinary
- This text constraint matches if text is a sequence of binary
octets in hexadecimal encoding, where each binary octet is a
two-character hexadecimal number. Lowercase and uppercase letters A
through F are permitted.
- unsignedByte
- This text constraint matches if the text value is a
xsd:unsignedByte. This is an integer between 0 and 255, both
included, optionally preceded by a + sign and leading zeros.
- unsignedShort
- This text constraint matches if the text value is a
xsd:unsignedShort. This is an integer between 0 and 65535, both
included, optionally preceded by a + sign and leading zeros.
- unsignedInt
- This text constraint matches if the text value is a
xsd:unsignedInt. This is an integer between 0 and 4294967295, both
included, optionally preceded by a + sign and leading zeros.
- unsignedLong
- This text constraint matches if the text value is a
xsd:unsignedLong. This is an integer between 0 and
18446744073709551615, both included, optionally preceded by a +
sign and leading zeros.
- oneOf <constraint
script>
- This text constraint matches if one of the text constraints
defined in the argument constraint script matches
the text. It stops after the first matches and probes the text
constraints in the order of definition.
- allOf <constraint
script>
- This text constraint matches if all of the text constraints
defined in the argument constraint script matches
the text. It stops after the first match failure and probes the
text constraints in the order of definition. Since the schema
definition command text also expects all text
constraints to match the text constraint, allOf is
useful mostly in connection with the oneOf text
constraint command.
- not <constraint
script>
- This text constraint matches if none of the text constraints
defined in the argument constraint script matches
the text. It stops after the first matching constraint in the
constraint script and reports validation error.
The text constraints in the constraint script are
probed in the order of definition.
- whitespace (preserve|replace|collapse) <constraint
script>
- This text constraint command does white-space (#x20 (space, '
'), #x9 (tab, \t), #xA (linefeed, \n), and #xD (carriage return,
\r) normalization to the text value and checks the resulting text
with the text constraints of the constraint script argument. The
normalization method preserve keeps everything as
it is; this is another way to say allOf. The
replace normalization method replaces any single
white-space character (as above) to a space. The collapse normalization method removes all leading and
trailing white-space, and all the other sequences of contiguous
white-space are replaced by a single space.
- split ?type
?args??<constraint script>
-
This text constraint command splits the text to test into a list
of values and tests all elements of that list for the text
constraints in the evaluated constraint
script>.
The available types are:
- whitespace
- The text to split is stripped of all white space at start and
end and splitted into a list at any successive white space.
- tcl tclcmd ?arg ...?
- The text to split is handed to the tclcmd,
which is evaluated on global level, appended with every given arg
and the text to split as last argument. This call must return a
valid Tcl list whose elements are tested.
The default in case no split type argument is given is whitespace.
- strip <constraint
script>
- This text constraint command tests all text constraints in the
evaluated constraint script> with the text to
test stripped of all white space at start and end.
- fixed value
- The text constraint only matches if the text value is string
equal to the given value.
- enumeration list
- This text constraint matches if the text value is equal to one
element (respecting case and any white-space) of the argument
list, which has to be a valid Tcl list.
- match ?-nocase? glob_style_match_pattern>
- This text constraint matches if the text value matches the glob
style pattern given as argument. It follows the rules of the Tcl
[string match] command, see https://www.tcl.tk/man/tcl8.6/TclCmd/string.htm#M35.
- regexp expression
- This text constraint matches if the text value matches the
regular expression given as argument. https://www.tcl.tk/man/tcl8.6/TclCmd/re_syntax.htm
describes the regular expression syntax
- length length
- This text constraint matches if the length of the text value
(in characters, not bytes) is length. The length
argument must be a positive integer or zero.
- maxLength length
- This text constraint matches if the length of the text value
(in characters, not bytes) is at most length. The
length argument must be an integer greater zero.
- minLength length
- This text constraint matches if the length of the text value
(in characters, not bytes) is at least length. The
length argument must be an integer greater zero.
- id ?keySpace?
- This text constraint command marks the text as a document wide
ID (to be referenced by an idref). Every ID value within a document
must be unique. It isn't an error if the ID isn't actually
referenced within the document. The optional argument keySpace does all this for a named key space. The key space
"" (the empty sting) is another key space then the id command without keySpace argument.
- idref ?keySpace?
- This text constraint command expects the text to be a reference
to an ID within the document. The referenced ID may appear later in
the document, that the reference. Several references within the
document to one ID are possible.
Document wide uniqueness and foreign key constraints are
available with the text constraint commands id and idref. Keyspaces
allow for sub-tree local uniqueness and foreign key
constraints.
- keyspace <names list>
<constraint script>
- Any number of keyspaces are possible. A keyspace is either
active or not. An inside a constraint script
called keyspace with the same name does nothing.
This text constraint commands work with keyspaces:
- key <name>
- If the keyspace with the name <name> is
not active the constraint always matches. If the keyspace is
active, reports error if there is already a key with the value.
Otherwise it stores the value as key in this keyspace and
matches.
- keyref <name>
- If the keyspace with the name <name> is
not active always matches. If the keyspace is active then reports
error if there is still no key as the value at the end of the
keyspace <name>. Otherwise, it matches.
By default the validation engine stops at the first detected
validation violation and reports that finding. It does so by return
false (and sets, if given, the result variable with an error
message) in case the schema command itself is used to validate
input. If the schema command is used by a SAX parser or the DOM
parser, it does so by throwing error.
If a reportcmd is set this command is called on
global level appended with the schema command and an error type as
arguments in case a validation violation is detected. Then the
validation recovers from the error and continues. For some
validation errors the recover strategy can be determined with the
script result of the reportcmd.
With a reportcmd (which does not throw error if
called) the validation engine will never report validation failure
to its caller. The validation engine recovers, continues, and
reports the next error (if occuring) and so on until the end of the
input. The schema command will return true and the SAX parser and
DOM builder will process normally until the end of the input, as if
there had not been a validation error.
Please note that this happens only for validation errors. It is
not possible to recover from well-formedness errors. If the input
is not well-formed, the schema command returns false and sets (if
given) the result variable with an error message about the
well-formedness error.
If the reportcmd throws error while called by
the validation engine then validation stops and the schema command
throws error with the error message of the script.
While validating basically three events can happen: an element
start tag has to match, a piece of text has to match or an element
end tag has to match. The method info vaction
called in the recovering script or any script code called from
there returns, which event has triggered the error report
(MATCH_ELEMENT_START, MATCH_TEXT, MATCH_ELEMENT_END, respectively).
While the command walks throu the schema looking whether the event
matches other, data driven events (as, for example checking, if any
keyref within a keyspace exists) may happen.
Several of the validation error codes, appended as second
argument to the reportcmd calls, may happen at
more than one kind of validation event. The info
vaction method and its subcommands provide information about
the current validation event, if called from the report
command.
If a structural validation error happens, the default recovering
strategy is to ignore any following (or missing) content within the
current subtree and to continue with the element end event of the
subtree.
Returning "ignore" from the recovering script in case of error
type MISSING_ELEMENT recovers by ignoring the failed contraint and
continues to match the event further against the schema.
Returning "vanish" from the recover script in case of the error
types MISSING_ELEMENT and UNEXPECTED_ELEMENT recovers by ignoring
the event.
The XML Schema Part 0: Primer Second Edition (https://www.w3.org/TR/xmlschema-0/)
starts with this example schema:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xml:lang="en">
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN"
fixed="US"/>
</xsd:complexType>
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="SKU" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<!-- Stock Keeping Unit, a code for identifying products -->
<xsd:simpleType name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
A likely one-to-one translation of that into a tDOM schema
definition script would be:
tdom::schema schema
schema define {
# Purchase order schema for Example.com.
# Copyright 2000 Example.com. All rights reserved.
defelement purchaseOrder {ref PurchaseOrderType}
foreach elm {comment name street city state product} {
defelement $elm text
}
defpattern PurchaseOrderType {
element shipTo ! {ref USAddress}
element billTo ! {ref USAddress}
element comment ?
element items
attribute orderDate date
}
defpattern USAddress {
element name
element street
element city
element state
element zip ! {text number}
attribute country ! {fixed "US"}
}
defelement items {
element item * {
element product
element quantity ! {text integer}
element USPrice ! {text number}
element comment
element shipDate ? {text date}
attribute partNum ! {pattern "^\d{3}-[A-Z]{2}$"}
}
}
}
The RELAX NG Tutorial (http://relaxng.org/tutorial-20011203.html)
starts with this example:
Consider a simple XML representation of an email address book:
<addressBook>
<card>
<name>John Smith</name>
<email>[email protected]</email>
</card>
<card>
<name>Fred Bloggs</name>
<email>[email protected]</email>
</card>
</addressBook>
The DTD would be as follows:
<!DOCTYPE addressBook [
<!ELEMENT addressBook (card*)>
<!ELEMENT card (name, email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
A RELAX NG pattern for this could be written as follows:
<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0">
<zeroOrMore>
<element name="card">
<element name="name">
<text/>
</element>
<element name="email">
<text/>
</element>
</element>
</zeroOrMore>
</element>
This schema definition script will do the same:
tdom::schema schema
schema define {
defelement addressBook {
element card *
}
defelement card {
element name
element email
}
foreach e {name email} {
defelement $e text
}
}
Validation, Postvalidation,
DOM,
SAX