Geeks With Blogs
Josh Reuben

About 15 years ago,  software architect bright-sparks  proposed transforming data into HTML as the mutt's nuts for web apps. Of course, upon implementation, webdevs literally retched at the complexity, and moved on to other shiny new things (web forms ?!?).

HOWEVER, it must be stressed that XSLT is a [neglected] INCREDIBLY POWERFUL tool for semi-structured data transformation, and its cousin XQuery is an awesome weapon for querying hierarchal data.


So, here are my reference notes:

XPath

  • Overview - syntax for selecting a subset of nodes in a document (querying an XML document). used in XSLT & Xpointer, precursor for XQuery

syntax:

<axis>::<search element | node test> [<predicate expression>]

eg:

descendant::elementX get elementX from all descendants

  • context node element - starting point in the node tree from where to perform the Xpath query. specifies the document subset to query. can be the root

  • axes - specifies the search direction. can be: child , descendant , parent , ancestor , following-sibling , preceding-sibling , following , preceding , attributes (contains the attributes of the context node), namespace (contains the namespace nodes of the context node), self (only the context node itself), descendant-or-self, ancestor-or-self

  • all nodes (the entire document tree) can be partitioned into 5 axes: ancestor, descendant, following, preceding & self

  • node tests (node type functions) - can be: node() (return all nodes of the context node - not very selective!), element() , attribute() , text() , cdata() , comment() , processing-instruction() , * (true for any node of the principal type)

  • principal node type - every axis has its own principal node type. for most axes the principal node type is ‘element’, but for attributes axis it is ‘attribute’ and for the namespace axis it is ‘namespace’. eg

descendant-or-self::*

attribute::* - fetches all attributes in the context node

attribute::attributeX - fetches all attributeX values in context node

  • constructing paths - XPath expressions can be appended to each other to form a longer expression by separating them with a forward slash. The 1st expression in the path is evaluated in the original context, and the result set of the expression forms the context of the next etc. Each of the nodes in the resultset is used as the context of the following expression

descendant::elementX/parent::*

  • Absolute path - Set an absolute path by prefixing a slash . Eg /child::*/attribute::* - selects nothing because the document root (the child) has no attributes!!! Sets the expression context to the document root (the parent of the root element, not the root element)

  • Abbreviated form - Makes queries shorter.

    • The default node test is *

    • The default axis is child

Child::elementX = elementX

  • The attribute axis can be abbreviated to the prefix @

Attribute::elementX = @elementX

Attribute::* = @

  • The self axis can be abbreviated to the prefix .

Self::* = .

  • The parent axis can be abbreviated to the prefix ..

parent::* = ..

  • The descendant axis can be abbreviated to the prefix //

descendant::* = .//*

/descendant::* = //*

  • predicate expressions for selecting subsets - a way to optionally filter a subset from an XPath resultset. the predicate expression is appended in square brackets [ ]. predicate expressions can contain numeric values, XPath functions & XPath expressions! The filtered resultset from an XPath expression with a predicate expression can be further filtered by appending another predicate to it

    • Can use the following operators: = != <= >= > < and or + - * div mod | (union)

    • Boolean expressions - If the expression evaluates to true then the node remains in the resultset

Child::elementX[position() < 2] – returns only the 1st 2 elements found

Child::elementX[count() > 2] – only returns elements if > 3 are found!

  • integer expressions

Child::elementX[2] – returns only the 3rd element found

Child::elementX[last()] – returns only the 3rd element found

  • node set expressions - if the result of the expression is a node set, then the context node is included if there are nodes in the node set - allows for subquerying! If 2 node sets are compared, the result is true if any one node from the 1st can be matched with any one node from the 2nd ???

Child::elementX[Child::ElementY] – returns only the elements found that have subelements

Child::elementX[Attribute::AttributeX=’xxx’] – returns only the elements found that have a specific attribute value

    • String values - If the string value is numerical (eg [Attribute::AttributeX=’1’] ) then the node’s value is converted & numerically compared! If the string value is literal (eg [Attribute::AttributeX=’xxx’] ) then the expression is true if the string values are identical

  • Node set XPath functions

    • Count (node set) – returns number of nodes found

    • Last ()

    • End() – equivalent of last()

    • Position () – returns the position number of the context node in its set – nth element – eg return every 2nd element: [position() mod 2 = 1]

    • Id (‘xxx’) – returns nodes that have a specific ID attribute value(s)

    • Key (‘xxx’) – returns nodes that have a specific key

    • Namespace-uri (node set) – returns a string containing the namespace URI of the passed node set eg //*[namespace-uri() = ‘http://www.w3.org//1999/xsl/transform’

    • index() – returns index number of the context node within its parent

    • nodename() – returns the tag name including namespace prefix

    • nodetype() – returns a number indicating node type

    • value() – returns the value of an element or attribute

  • string XPath functions

    • string (object) – converts the passed object to a string

    • date (string) – converts the passed string to a date

    • starts-with (string1, string2) – check if the 1st string starts with the 2nd string – eg [starts-with (@lastname, “A”)]

    • translate (string1, string2, string3) – takes a string & char by char compares to 2nd string – if matched, converts to 3rd string

    • eg convert to upper case:

descentant::elementX

[

starts-with

(

translate

(

@lastname,

“abcdefghijklmnopqrstuvwxyz”,

“ABCDEFGHIJKLMNOPQRSTUVWXYZ”

),

“A”

)

]

  • number XPath functions

  • number (object) – converts any passed value to a number (Booleans go to 1 or 0)

  • sum (node set) – returns the sum of the numerical values of all passed nodes

  • round (n), floor(n), ceiling(n) – convert floating point number into integer

  • Boolean XPath functions

  • Not() – converts a Boolean value to its opposite

  • True() – always returns true

  • False() – always returns false

  • Lang() – used to check the language of the content?



XSLT

  • XSL - original goal was to convert XML to HTML. divided into 2 parts: 1) XSLT – transformation, 2) XSL-FO - Formatting objects – still in development

  • XSL-FO - formatting objects. define the display format of an XML document. uses CSS box model. use XSLT to make document display target specific. dynamically create PDFs

  • XSLT - Extensible Stylesheet Language Transformations. While it is fairly easy to generate WebForms with an editor, this approach is not maintainable – if there is a change in layout or style, then we have to rebuild dozens of pages  poor maintainability. A Stylesheet is a set of rules (themselves written in XML) For transforming an XML document. Also an alternative method to styling a document than CSS. Rules described in XML on how to convert a document from one schema to another. Involves 3 documents: 1) XML source, 2) XSLT transformation Stylesheet, 3) XML destination. Carried out by XSLT processor

  • Tool XSLTester from vbxml.com allows you to see source, Stylesheet & dest docs side by side

  • Templates - Each Stylesheet is composed of templates - defines how source document content elements are to appear in the destination document. like an event handler – produces nodes in the output document, but can also raise events itself. always has an XPath expression that describes what nodes in the source the template first applies to. At the start of transformation the event for processing the root is raised.

  • Transform steps: 1) starts with the XML Document data & searches for the context node to start transformation from. 2) the XSLT processor searches the XSL Stylesheet file for the correct template for transforming the node. 3) the template defines certain output nodes which are added to the result document. 4) the template specifies which node to process next (Goto step 2). 5) the process ends when there are no more nodes specified to process next. the most common form is for every template to tell the processor to continue by processing the children of the current node – ensures all nodes are processed & that no infinite loops occur

  • E.g.

var oXMLDocument As DOMDocument;

var oXSLTransform As DOMDocument;

oXMLDocument.Async = False;

oXSLTransform.Async = False;


oXMLDocument.Load “http://localhost/myVDir/myXML/x.xml”;

oXSLTransform.Load “http://localhost/myVDir/myXSL/x.xsl”;


strResult = oXMLDocument.TransformNode (oXSLTransform);

objXMLDocument = oXMLDocument.Transform (oXSLTransform);

  • XML to XHTML - One of XSL’s main uses is to transform XML to XHTML - An XML router. Unlike CSS, not limited to dynamically configuring only HTML. Generated HTML must be lower case to produce valid XHTML. If you want to produce HTML instead of XHTML, use <xsl:output method=”html” >

  • XML to XHTML with CSS support - insert a style attribute or a specific class attribute (that is referenced) into the specific XHTML element. this combination provides a very dynamic & powerful UI engine. don’t have to recompile a single line of code to change a display option, just modify the .xsl & .css files. instead of embedding cell formatting, fonts etc in the XSL, reference classes defined in CSS  just edit the CSS for changes – don’t have to sift through XSL <font> tags. e.g.

<? xml version=”1.0” ?>

<xsl:stylesheet version=”1.0” xmlns:xsl=“http://www.w3.org/1999/XSL/Transform” />


<xsl:template match=”DOCUMENT” >

<html>

<body>

<xsl:apply-templates select=”TITLE” />

<xsl:apply-templates select=”INTRO” />

<xsl:apply-templates select=”BODY” />

</body>

</html>

</xsl:template>


<xsl:template match=”TITLE” >

<xsl:apply-templates />

<p/>

</xsl:template>


<xsl:template match=”INTRO” >

<xsl:apply-templates />

<p/>

</xsl:template>


<xsl:template match=”BODY” >

<xsl:apply-templates select=”ITEM”/>

<p/>

</xsl:template>


‘—a comma delimited list

<xsl:template match=”ITEM” >

<xsl:apply-templates />

<xsl:if test=”position() != last()”>

,

</xsl:if>

</xsl:template>


</xsl:stylesheet>

  • client side XSLT - PI tells the browser which XSL file to download & use. Can take a large part of processing from server to client. E.g.

<? xml version=”1.0” encoding=”utf8” ?>

<? xml-stylesheet type=”text/xsl” href=”x.xsl” ?>

<ROOT>

</ROOT>


XSLT elements for composing the XSLT Stylesheet:

  • <xsl:Stylesheet> , <xsl:transform> , <xsl:import> , <xsl:apply-imports> , <xsl:include> , <xsl:output> , <xsl:template> , <xsl:apply-templates> , <xsl:call-template> , <xsl:attribute-set> , <xsl:strip-space> & <xsl:preserve-space> , <xsl:namespace-alias> , <xsl:key>

<xsl:Stylesheet>

  • Normally the root element of a Stylesheet. Holds templates & contains configuration attributes. Elements that can only appear as direct child subelements of Xsl:Stylesheet are called top level elements. Attributes:

  • Xmlns:xsl - To differentiate the XSLT specific elements in a Stylesheet from other XML content. the stylesheet uses the official XSLT namespace: xmlns:xsl=“http://www.w3.org/1999/XSL/Transform” (doesn’t actually point to a real URL)

  • Version - Ensure that later additions to the XSLT specification can be implemented without changing existing stylesheets. Defaults to “1.0” – the current version. If set to anything else, the XSLT processor switches on forward compatibility mode – ignores any unknown elements or elements in the wrong place

  • Extension-element-prefixes - Allows XSLT processor vendors to add their own private extensions. Use to assign namespace prefixes (besides the default xsl:) as XSLT prefixes. These prefixes must be defined namespaces. The XSLT processor will watch out for any namespace prefixes specified. To determine if the XSLT processor supports an extension element or function use the XSLT functions element-available() and function-available()

  • Exclude-result-prefixes - To exclude specific namespaces in the source document from appearing in the destination document. Default – all namespace declarations in the source document automatically appear in the destination document except xsl:

  • Eg a simple Stylesheet:

<?xml version=”1.0” ?>

<xsl:stylesheet

id=”stylesheet1”

version=”1.0”

xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”

extension-element-prefixes=<tokens>

exclude-result-prefixes=<tokens>

>

<xsl:template match=”/”>

<root_node/>

</xsl:template>

</xsl:stylesheet>


this stylesheet will transform any source document into the following destination document:

<root_node/>

<xsl:transform>

  • Exactly the same as xsl:stylesheet

<xsl:import>

  • Allows you to construct a Stylesheet from several reusable external stylesheet document fragments. Beware of relative paths  VB6 DOM transform errors! The document retrieved from the URI should be a Stylesheet document itself – the children of its xsl:stylesheet element are imported directly into the main xsl:stylesheet element. Can only be used as a top level element. Must appear before any xsl:template elements. Local rules override imported rules!!! Circular references are illegal (even indirect ones). If the main XSLT document contains templates that match the same XPath expression of templates in the Stylesheet imported by <xsl:import>, then the main Stylesheet templates will override the imported ones. When several documents are imported, the XML processor builds a tree of imported documents.

<xsl:import href=”xxx.xsl” />

<xsl:apply-imports>

  • use to override & extend any imported templates. e.g. encase the transformed value between 2 strings:

<xsl:import href=”xxx.xsl” />

<xsl:template match=”ELEMENTX” >

xxx <xsl:apply-import /> xxx

</xsl:template>

<xsl:include>

  • Simpler than xsl:import. Just insert the rules from the referenced URI. Can only appear at the top level. Difference – can appear after an xsl:template element

<xsl:output>

  • set of attributes which specify the style of generated output. Attributes:

  • method - has 3 possible values: “xml | html | text” – upon which the other attribs are based. default = xml. if set to html: empty elements will not automatically receive a closing tag , script & style elements will not be escaped, non ASCII chars are HTML escaped. if set to text output will be restricted to only the string values of every node

  • version - for method=”xml”: specifies which XML version – default = “1.0”. for method=”html”: specifies HTML version – default = “4.0”

  • encoding - for method=”xml”: default to “UTF-8”. for method=”html”: if an encoding is specified, a <meta> tag is included in the <head>

  • indent - values = “yes | no” default is no. yes adds more whitespace to improve readability

  • cdata-section-elements - tells the processor when to use CDATA sections in the dest document and when to escape illegal chars by using entity references. e.g. < is replaced by &lt; value holds a space delimited list of element names – text nodes whose parent node appears here will be outputted as CDATA sections, else they are escaped

  • omit-xml-declaration - values = “yes | no” default is no

  • doctype-system - set validation rule that a <!DOCTYPE fragment will be included before the 1st element. the value of this attrib is the DTD URL (system identifier). doctype attrib value in the dest document will be name of root element

  • doctype-public - as above except doctype attrib value in the dest document will refer to a public DOCTYPE

  • media-type - used to specify a MIME type for dest document. default = “text/xml”. media-type defaults to “text/plain”. can set other media types

<xsl:template>

  • the main building block of an XSLT Stylesheet. start processing from the XML element(s) that match the match XPath statement. consists of 2 parts: 1) the match pattern – defines which nodes can act as input for the template; 2) the implementation – defines what the output will look like

  • if several templates match and are of the same mode: a locally scoped template (a child element of <xsl:apply-templates>) will take precedence ; a template within the main stylesheet takes precedence over an imported one; a template imported later takes precedence over a template imported earlier; a template with a higher priority attrib value takes precedence over a template with a lower priority attrib value; a more specific match attrib value takes precedence over a more general match attrib value; if several templates exist in the same document with the same priority attrib, the one nearest the bottom takes precedence.

  • the attributes name, priority & mode are used to differentiate between multiple templates that match on the same node

  • match - holds the matching pattern for the template. specifies what XML element in the source document to apply the template to. syntax – a subset of XPath that uses only the axes: child, attribute & // (descendant)

eg match the document root

<xsl:template match=”/” >


eg match all nodes, but not attributes & the root:

<xsl:template match=”node()” >


eg match any ElementY that has an ElementX as a parent:

<xsl:template match=”child::elementX/child::elementY” >


eg match any ElementY that has an ElementX as an ancestor:

<xsl:template match=”elementX//elementY” >


eg match any ElementX or ElementY:

<xsl:template match=”(elementX|elementY)” >

  • name

  • priority - can be any positive or negative numeric value. if not set, it is calculated from the match attrib as a value between 0.5 and -0.5 as follows:

a specific name along a child or attribute axis  priority=”0”

e.g. child::ELEMENTX or @attribx


unspecified name along a child or attribute axis  priority=” -0.25”

e.g. attribute::* or *


only a node test  priority=”0”

e.g. node() or text()

all other cases  priority=”0.5”

  • mode - use when templates are only applied in special cases. e.g. print templates

  • implementation - any subelements are the content that is placed in the destination document - eg <root_node> - a literal element, not an XSLT element

  • 2 default templates are provided - can be overruled by creating a template that matches the same nodes. default templates process all nodes in the source document. if you try to transform a source document using only the built in templates, the 1st built in template would match the source document root then process all child nodes --> processes the document but produces no output except for any text nodes (matched by 2nd built in template)

1) match all elements & the root

<xsl:template match=”*|/”>

<xsl:apply-templates />

</xsl:template>


2) match text nodes & all attributes

<xsl:template match=”text()|@*”>

<xsl:value-of select=”.” />

</xsl:template>

  • simplified syntax - for simple Stylesheets consisting of only 1 template that matches the root, the whole document is considered the content of the template

<html xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” >

<body>

<xsl:value-of select=”/ELEMENTX/ELEMENTY” />

<p/>

</body>

</html>

<xsl:apply-templates>

  • start the processing of a node – invoke another template. way to tell XSLT processor to continue processing other nodes after finding a source document node that matches an xsl:template & implementing it in the destination document. the most common XSLT mistake is to leave this out! selects nodes to process next using an XPath expression. set the next context node for the XSLT processor to apply an xsl:template to. the transformed output of the specified node will appear within the output generated by the current template. attributes:

  • Select - specifies which nodes to transform next using an XPath expression. if not specified, defaults to child::node() – all child nodes of the current context excluding attributes

  • Mode - only use an <xsl:template> if it has the same mode attrib value

  • eg do all elements!

<xsl:template match=”/”>

<root_node>

<xsl:apply-templates /> ‘does the child elements of root

</root_node>

</xsl:template>


<xsl:template match=”*”>

‘—triggered by the 1st <xsl:apply-templates> AND recursively!

‘—match all nodes in current context

<result_node> ‘—just an arbitrary node we made up

‘—go to child elements

<xsl:apply-templates /> ‘—because of “*” recursive!

</result_node>

</xsl:template>

  • eg do all elements to HTML output using literals!

<xsl:template match=”/”>

<HTML><BODY>

<xsl:apply-templates />

</BODY></HTML>

</xsl:template>


<xsl:template match=”*”>

<xsl:apply-templates />

<!- some content here -->

</xsl:template>

<xsl:call-template>

  • used to organize templates in a document. the target template must have a name attribute. works like <xsl:apply-templates> but without changing the context node. if a template is called by name attrib the match attrib is ignored. see Wrox proff VB6 XML pg 196 for example

<xsl:attribute-set>

  • use to define attributes that multiple elements can use  smaller & easier to maintain XSL documents. can be used to define attributes that are used together e.g.

<xsl:attribute-set name=”myattribs” >

<xsl:attribute name=”size”>

5

</xsl:attribute>

<xsl:attribute name=”face”>

Arial

</xsl:attribute>

</xsl:attribute-set>


<xsl:template match=”ELEMENTX” >

<font xsl:use-attribute-set=”myattribs” >

<xsl:apply-templates />

</font>

</xsl:template>

<xsl:strip-space> & <xsl:preserve-space>

  • 2 points during transformation where whitespace can appear or not: 1) when parsing the source & Stylesheet documents and constructing a tree, 2) encoding a generated XML tree to the dest document. parser removes all text nodes that: consist entirely of whitespace chars, have no ancestor node with the space attrib set to preserve, are not children of whitespace preserving element, after stripping space from source & Stylesheet docs, parsing occurs, for the Stylesheet, the only whitespace preserving parent is <xsl:text>. NOTE: by default, all elements in the source document preserve whitespace

  • elements attribute - space delimited list of elements in the source doc. use to specify which elements to explicitly strip or preserve whitespace for. can accept XPath expressions. conflicts are resolved like the Template conflict resolution

<xsl:namespace-alias>

  • used only when transforming a source doc into a XSLT document. allows dest document to hold the XSLT namespace & literal elements without interfering with the transformation process. allows you to use another namespace in the Stylesheet and have its declaration show up in the dest document with another URI. in the transforming XSLT, literal XSLT output elements have fake namespaces, but in the dest XSLT document, the same prefixes will refer to the correct URI.uses 2 attribs to accomplish this: stylesheet-prefix and result-prefix. e.g.

<xsl:stylesheet version=”1.0”

xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”

xmlns:myxsl=”http://www.w3.org/1999/XSL/MyTransform” >

<xsl:namespace-alias stylesheet-prefix=”myxsl” result-prefix=”xsl” />

<xsl:template match=”/” >

<myxsl:stylesheet>

<xsl:apply-templates>

</myxsl:stylesheet>

</xsl:template>

</xsl:stylesheet>

<xsl:key>

  • analogous to creating an RDBMS index. attribs: name - specify a name for the key, match - XPath statement of where the index should be created, use - specify which attrib to index. allows direct access to document nodes via the key() function - 2 params: name & value - returns a string holding all matching nodes. note: a node can be indexed by multiple keys. use when XML elements refer to each other using IDs, but without validation. increases readability & helps perf as the XSLT processor keeps all source document key references in an in memory hash table eg

to set a key:

<xsl:key name=”mykey” match=”ELEMENTX” use=”@attribx” />

to use:

key(‘mykey’, ‘xxx’)



XSLT Elements that generate output elements

  • Place as subelements of xsl:template elements

  • Literals - The most easily understandable elements in an XSLT Stylesheet. Any fragment of valid XML within an <xsl:template> element that is not in the xsl: namespace. Passed to destination document as is. Can include both text & XML elements. Must always be well formed. Cannot include other node types e.g. comments or CDATA

  • Attribute value templates - Much more readable than xsl:attribute. Can contain an expression in curly brackets that is evaluated. Evaluated before execution of the element the attribute is in??? Can be an Xpath expression

In source XML Doc:

<imageElement>

<url>myImages/x.jpg</url>

<size width=”50” />

</imageElement>


In XSL file:

<xsl:template match=”imageElement” >

<img src=”{url}” width=”{size/@width}” />

</xsl:template>


In dest XML Doc:

<img src=”myImages/x.jpg” width=”50” />

<xsl:value-of>

  • generates text output containing the string value of the context node. attributes:

  • Select - indicates which node’s value to output. an Xpath expression that is evaluated in the template’s context. eg select=”@attributeX”

  • disable-output-escaping - use if source XML doc text contains reserved chars that you don’t want the XSLT processor to replace with escape chars. only use if you don’t require the dest doc to be well formed. (can use <xsl:eval no-entities=’true’ instead)

<xsl:copy>

  • creates a node in the destination document with the same node name & node type as the context node. doesn’t copy attributes or children of context node

<xsl:copy-of>

  • copy a set of nodes to the destination document. has a select attribute to indicate which nodes to copy relative to the source context node. similar to xsl:value-of except it copies instead of converts & it will copy all selected nodes, not just the 1st

<xsl:element>

  • allows creation of elements in destination document. attributes (actually attribute value templates):

  • name – compulsory - specifies destination element name

  • namespace – optional - set the namespace of the created element. if set, the XSLT engine may change the prefix set in the name attribute

<xsl:attribute>

  • generates attributes in the destination document. set the value of a given attribute to the value obtained dynamically through an XPath select statement in an <xsl:value-of> element

  • limitations: cannot insert an attribute after child elements have been added to that element, can only be used in a the context of an element (cant add to the context of a comment node), within the attribute element, no node may be generated other than text nodes, attribute nodes can obviously not have child nodes

  • often used to create attributes in the output that have a calculated name. attribute value templates includes all XSLT literal attributes + some attributes on predefined XSLT elements

<xsl:template match=”@AttributeX”>

<xsl:attribute name=”AttributeY”>

<xsl:value-of />

</xsl:attribute>

</xsl:template>

  • can contain an expression to be evaluated within curly braces, which is evaluated before the attribute’s container element is executed e.g.

<ELEMENTX attribx=”xxx{4 + 5}” />



<ELEMENTX attribx=”xxx9” />

  • the expression can be XPath – e.g.

--source xml:

<PICS>

<PIC idattrib=”xxx” >

<URL>images/x.jpg</URL>

<SIZE width=”50” />

</PIC>

</PICS>


--xsl:

<xsl:template match=”/” >

<DOCUMENT>

<IMAGES>

<xsl:apply-templates select=” PICS/PIC” />

<IMAGES>

</DOCUMENT>

</xsl:template>

<xsl:template match=”PIC” >

<img src=”{URL}” width=”{SIZE/@width}” >

<xsl:attribute name=”id” >

<xsl:value-of select=”@idattrib” />

</xsl:attribute >

</img>

</xsl:template>


--dest xhtml:

<DOCUMENT>

<IMAGES>

<img src=”images/x.jpg” width=”50” id=”xxx”/>

</IMAGES>

</DOCUMENT>

<xsl:text>

  • this element creates a text node in the destination document, holding the content of the original text element. better than a literal because it can include whitespace. attributes:

  • disable-output-escaping - see <xsl:value-of> for explanation

  • e.g. the following 2 are identical in result:

<xsl:template match=”ELEMENTX” >

<xsl:text> xxx </xsl:text>

</xsl:template>


<xsl:template match=”ELEMENTX” >

xxx

</xsl:template>

<xsl:processing-instruction>

  • generate a PI in the destination document. typically for preprocessing – specifying the transformation rules for the next step. can only contain text, no subelements. the text cannot contain the string?> as it signals end of PI.

  • name attribute must contain a valid PI name, but not “xml” – this XSLT element cannot be used to generate the actual XML declaration

  • has 2 attributes that must be in the text: href and type – must be created as text nodes, not attributes, because the PI content is not necessarily well formed XML

<xsl:processing-instruction name=”xml-stylesheet” >

href=”x.xsl” type=”text/xsl”

</xsl:processing-instruction>



<? xml-stylesheet href=”x.xsl” type=”text/xsl” ?>

<xsl:comment>

  • the only way to create comments in the destination document, because comments in the source doc are ignored

<xsl:comment > xxx </xsl:comment >



<!—xxx -->

<xsl:number>

  • numerical conversion tool – creates a formatted numeric value in the output. simplest way to use is just specify value attrib. attribs:

  • value - e.g. “1000” - string is evaluated & converted to a number, rounded to an integer & converted back into a string

  • level - can be “single | multiple | any”. single is default, multiple can return many concatenated numbers

  • count - specify a pattern to match

  • from - specify a pattern. allows us to look at only matching part of the ancestor axis

  • format - values can be: “1” – default = integers, “01” – integers with 0 prefix, “I”, “i" – upper/lower case roman numerals, “A”, “a” – upper/lower case alphabetic

  • grouping-seperator - the separator char – default = “.”

  • grouping-size - a number to specify how many chars between each grouping char

  • e.g.

<xsl:number value=”1000000” grouping-size=”3” grouping-seperator=”,” />



1,000,000

<xsl:eval> and <xsl:script>

  • not standard – Microsoft extensions. the <xsl:eval> element generates a text node in the dest document using script. the <xsl:script> element contains script function definitions that can be called from the <xsl:eval> element or from attribs that are evaluated as an expression. because scripting languages contain non allowable XML chars, place in CDATA nodes. the this value refers to the context node. e.g.

‘—source XML:

<ELEMENTX attribx=”A” attriby=”100” />


‘—XSL:

<xsl:template match=”ELEMENTX[attribx=’A’]”>

<xsl:eval language=”JavaScript” >

getInt (this.getAttribute (‘attriby’))

</xsl:eval>

</xsl:template>

<xsl:script language=”JavaScript” > <![CDATA[

Function getInt (intX)

{

getInt = intX * 10;

}

</xsl:script>

<xsl:message>

  • rarely used. for issuing warnings & errors. has 1 attrib: terminate – if set to “yes” the processor stops after the message.


XSLT programmatic Elements

<xsl:if>

  • for conditionals. has 1 attrib: test – takes an XPath expression as a param. If it returns false (no nodes) then the content of this element is not executed. can be nested. if you have multiple conditions, use <xsl:choose> instead

<xsl:template match=”ELEMENTX” >

<xsl:if test=”@attribx=’xxx’” >

<xsl:value-of select=”@attribx” />

</xsl:if >

</xsl:template >

<xsl:choose> , <xsl:when> , <xsl:otherwise>

  • a case statement

<xsl:template match=”ELEMENTX” >

<xsl:choose>

<xsl:when test=”@attribx=’xxx’” >

<xsl:value-of select=”@attribx” />

</xsl:when >

<xsl:otherwise >

<xsl:text> not found </xsl:text>

</xsl:otherwise >

</xsl:choose>

</xsl:template >

<xsl:for-each>

  • iterate through all XML elements that match the select XPath statement. for iterating through a set of nodes. select attrib holds an XPath expression. same functionality as <xsl:apply-templates> (as an inline template), but much easier to read

<xsl:template match=”ELEMENTX” >

<xsl:for-each select=”child::ELEMENTY”>

<xsl:value-of select=”attribute::attribx” />

<xsl:text> &#13 </xsl:text> ‘--CR

</xsl:for-each>

</xsl:template>

<xsl:sort>

  • can be used to sort an XSLT iteration. can only be a child element of <xsl:apply-templates> or <xsl:for-each>. optional attribs:

  • order - value can be “ascending | descending”

  • data-type - value can be “alphabetically | numerically”. default value is alphabetically  10 < 9

  • case-order - value can be “lower-first | upper-first”

<xsl:template match=”ELEMENTX” >

<xsl:apply-templates select=” ELEMENTY”>

<xsl:sort select=”attribute::attribx” />

<xsl:sort select=”attribute::attriby” order=”ascending” />

</xsl apply-templates >

</xsl:template>

<xsl:variable>

  • acts like a constant. has a name attrib & an optional select attrib. <xsl:copy-of> is a convenient way to insert variable value into dest document

declare using an XPath expression  numeric value in this case

<xsl:variable name=varx” select=”2” />


declare using an included XML fragment  string value

<xsl:variable name=varx”> 2 </xsl:variable>


to reference the variable, prefix its name with “$”

<xsl:value-of select=”item[$varx]” />

<xsl:copy-of select=”$varx” />

<xsl:param> and <xsl:with-param>

  • acts like a variable. defined as a sub element of an <xsl:template> - a value can be passed when the template is executed via <xsl:call-template> or <xsl:apply-templates>. its select value is used as an initial value and can be changed - used to set attribs of other xsl elements. Set value from another template using <xsl:with-param> - Can be a child of <xsl:call-template> or <xsl:apply-templates> & has 2 attribs: name and select. see Wrox Proff VB6 XML pg 203 to see how to do from DOM. e.g.

‘—set up a template containing a number that uses a param value to set one of its attribs

<xsl:template name=”mytemplate” >

<xsl:param name=”paramx” select=”1. ” />

<xsl:number format=”{$paramx}” />

<xsl:apply-templates />

</xsl:template>


‘—set the value of the param from another template

<xsl:template match=”ELEMENTX” >

<xsl:call-template name=”mytemplate” >

<xsl:if test=”@attribx=’xxx’” >

<xsl:with-param name=”paramx” select=”a. “ />

</xsl:if>

</xsl:call-template >

</xsl:template>

XSLT Functions

  • Can be used in expressions

Generate-id(node-set)

  • Generates a unique ID as a string of alphanumeric chars. Depends on 1st element in passed node-set. If no node is passed, the context node is used

Format-number (number, string, string)

  • 1st param - function Converts it to a string

  • 2nd param - use to specify format - can have 2 values separated by a semicolon to handle positive & negative values. can use the following chars: 0 – digit , # - digit without leading & trailing zeros, . - decimal separator, , - grouping separator, - - negative prefix, % - percent, X - any char can serve as a prefix or suffix

  • 3rd param - optional reference to an <xsl:decimal-format> element (must be a top level element and has attribs: name, decimal-seperator, grouping-seperator)

  • e.g.

<xsl:decimal-format name=”myformat”

decimal-seperator=”,”

grouping-seperator=”.” />

<xsl:template match=”/” >

<xsl:value-of select=”format-number (1111.1, ‘#.###,00’, ‘myformat’)” />

</xsl:template>

document(object, node-set)

  • use to combine info from several source docs into 1 dest doc. object param is a string referring to a specific document URI. node-set is an optional param – nodes are converted to URI strings. requires some XML that references all the documents. e.g. combine stuff in local doc with referenced docs

‘—source doc

<ELEMENTX>

</ELEMENTY>

<ELEMENTZ>

<ELEMENTABC/>

</ELEMENTZ>

<ELEMENTX>

<ITEM type=”url” loc=”http://www.x.com/xxx.xml” />

<ITEM type=”local” loc=”xml/xxx.xml” />

‘—xsl:

<xsl:template match=”ITEM” >

<xsl:if test=”@type=’url’” >

<a href=”{@loc}”>

<xsl:value-of select=”.” />

</a>

</xsl:if >

<xsl:if test=”@type=’local’” >

<a href=”{@loc}”>

<xsl:value-of select=”document(concat(‘xxx’, @loc, ‘.xml’))

/ELEMENTX/ELEMENTY”/>

</a>

(by: <xsl:apply-templates

select=”document(concat(‘xxx’, @loc, ‘.xml’))

/ELEMENTX/ELEMENTZ/ELEMENTABC”/>)


</xsl:if >

</xsl:template>

current()

  • returns the current context. useful in sub-queries & XPath expressions. allows construction of XPath expressions similar to SQL Joins- combine & compare values from different contexts. e.g. (confusing)

each ELEMENTX has 2 sub elements ELEMENTY (one) & ELEMENTZ (many)

1st predicate checks if the text of ELEMENTZ relative to the selected ELEMENTX is equal to the current context of the for-each  selects the ELEMENTX whose ELEMENTZ subelement has the same text value as the current ELEMENTZ

2nd predicate checks if the ELEMENTX’s ELEMENTY subelement’s text value is equal to the current ELEMENTZ’s ancestor ELEMENTX’s ELEMENTY subelement’s text value

<xsl:template match=”ELEMENTZ” >

<xsl:for-each select=”//ELEMENTX[ELEMENTZ/text() = current()/text()]

[ELEMENTY != current()/ancestor::ELEMENTX/ELEMENTY ”>

<a>

<xsl:attribute name=”href” >

mylink <xsl:number/> ‘—e.g. mylink1

</xsl:attribute>

<xsl:value-of select=”ELEMENTY” /> ‘—text value

</a>

<br/>

</xsl:for-each>

</xsl:template >



XQuery


XQuery Overview

  • This technology facilitates query searches of distributed datasources (XML documents or XML Streams exposed from Relational Databases or WebServices) over the entire semantic web.

  • XQuery is to XML data is what SQL is to relational data

  • Designed for working with node sets, not individual values

  • Returns structured hierarchical( user defined ), dataset, unlike SQL which returns a flattened dataset that must be rebuilt as XML

  • Standard - platform & DB independent - can operate over any form of XML

  • Limits - read only - unlike SQL, does not support updates

  • Superseded competition - XQL, SQLX, Oasis TransQuery

  • Version 2.0 - supports read/write, overloading & polymorphic functions, extensibility mechanisms, data definition facilities for persistent views, ability to access the SQLXML functionality of XSD mapping schemas (to improve performance and more efficiently control the structure of the returned XML outside of the SQLXML query), ability to specify a URI to retrieve XML directly from SQL Server, and XML Templates that contain multiple XML PIs (processing instructions )


W3C XML specifications interact with XQuery:

  • XPath

    • A component of XSLT & XQuery

    • Expresses paths through an XML hierarchy to one or a set of nodes

    • XPath 2.0 is sequence based - can understand the concept of ordering in a sequence of nodes

  • XML Schema

    • Templates for an XML document

    • XQuery is compatible & can define XML behaviors beyond just the minimum & maximum bounds

  • PSVI Infoset

    • Post schema validation

    • Generated by XML Schema processors to ensure that the XQuery data model captures (& stores in memory ) everything the parser determines about the document

    • Http://www.w3.org/tr/xml-infoset/

  • XSLT

    • XQuery is more than XSLT without the template rules & some XPath axes - has several advantages over XSLT:

    • XQuery can query over multiple documents

    • XQuery is a more compact syntax

    • XQuery supports the XML Schema type system

    • XQuery is designed for database optimization


XQuery syntax

  • Path expressions - based on XPath 1.0 syntax, can navigate to a set of nodes or values in the XML document - includes an inter-document dereference & range operators

  • element constructors - return hierarchical XML data using the literal <starttag> & </endtag> elements to build resultset XML structure & curly brackets {} to distinguish literal content from evaluated sub-expressions

  • FLoWeR expressions - (FOR, LET, WHERE, RETURN ) - more powerful version of SQL SELECT statement - return information that satisfies the condition

  • Rich set of expression functions & operators

  • Conditional expressions - if ... Then ... else

  • quantified expressions - flower expressions can check that some or all tuples created by the FOR & LET clauses satisfy a given condition by using the WHERE SOME & WHERE EVERY predicate clauses - test every value of a collection, composing output while narrowing a search - EVERY operates like a logical AND, and SOME operates like a logical OR

  • E.g.

FOR $NodeSetX IN document("xxx.xml")//elementX

WHERE SOME $NodeSetY IN $NodeSetX/elementY SATISFIES contains ( $NodeSetY , "xxx")

RETURN

$NodeSetX

  • Expressions that test or cast (modify) data types - can treat an expression as though it were a subtype of its actual type Variables are not assigned to, but bound so that values are immutable - prevents side affects of value reassignment, allows query optimization

  • Select members of a set & return them to the output stream

  • the FOR clause - handles iteration over sequences by specifying a variable (prefixed with "$") that acts as a variable reference that points to each element in the selection of nodes

  • The IN clause - reference an XPath expression to assign values to the variable specified in the FOR clause

  • The LET clause - handles assignment of sequences to a placeholder variable

  • The RETURN clause - return XML results as specified within the curly brackets - converts their contents into XML

  • RETURN does not work like a Return statement in a programmatic language - it doesn't return a single value, instead acting as a pattern construction template for each value assigned to each NodeSet variable specified in a FOR or LET clause - ensuring delivery of well formed XML fragments

  • Subqueries & expression chaining - compose complex queries by including a FLWR expression within a RETURN clause

  • The WHERE clause - evaluates a boolean predicate to filter the returned NodeSet

  • The IN clause - same functionality as WHERE, but uses XPATH

  • WHERE string ($NodeSetX/elementY) = "xxx" --is equivalent to: IN document ("xxx.xml") //elementX[elementY="xxx"]

  • The document () function references an XML document

  • The string() function retrieves the text node of an element and converts it into a string expression

  • E.g. select member elements of a set:

FOR $NodeSetX IN document ("xxx.xml") // elementX

WHERE string ($NodeSetX/elementY) = "xxx"

RETURN {$NodeSetX} -- return matching elements

  • The RETURN statement can return complex expressions - e.g. an XHTML unordered list:

RETURN

<li>

{string ($NodeSetX/elementY)}

</li>

  • can embed an XQuery expression within any XML element - e.g. in XHTML:

<body>

<ul>

FOR $NodeSetX IN document ("xxx.xml") // elementX

WHERE string ($NodeSetX/elementY) = "xxx"

RETURN

<li>

{string ($NodeSetX/elementY)}

</li>

</ul>

</body>

  • The LET clause - assign node sequences to a single variable to be used by the FOR statement to iterate through a sequence

  • The LET statement creates NodeSet variables of 0 or more nodes - the NodeSet is connected to the source XML as each node acts as a pointer to a source node

  • Use the XPath function value-distinct to remove duplicate nodes from a NodeSet

  • E.g.

LET $NodeSetX := document ("xxx.xml") // elementX

LET $NodeSetY := value-distinct (document ("xxx.xml") // elementY)


FOR $NodeSetA IN $NodeSetX

RETURN

<elementA >

<elementJ >

{string($NodeSetX)}

</elementJ>

<elementY>

{

FOR $NodeSetB IN $NodeSetY

WHERE $NodeSetB/elementA = $NodeSetA

RETURN

<elementB>

{string($NodeSetB/elementZ )


}

</elementB>


}

</elementY>


</elementA>

  • Constraining FLWR statements - improve query efficiency by terminating a set of comparisons once a hit has been met

  • use the SOME and SATISFIES keywords to setup a condition where queries are made until a single conditional expression evaluates as true, at which point the RETURN expression is called - e.g.

FOR $NodeSetX IN document ("xxx.xml") // elementX

WHERE SOME $propertyX IN $NodeSetX/*

SATISFIES contains ($propertyX, 'abc')

RETURN { $NodeSetX }

  • Note: $NodeSetX/* - all children

  • The BEFORE & AFTER keywords constrain the range of searchable elements in either direction, reducing the amount of processing required - e.g.

FOR $NodeSetX IN document ("xxx.xml") // elementX

BEFORE string ($NodeSetX/attributeX ) = "aaa"

AFTER string ($NodeSetX/attributeX ) = "eee"

RETURN { $NodeSetX }

  • can use conditional statements in a RETURN clause (more powerful than WHERE) - e.g.

RETURN <elementX>

{

IF ($NodeSetX/elementY )

THEN

{

string($elementA),

string($elementB/@href)

}

ELSE

{

string($elementC)

}

}

  • the SORTBY operator acts on the FOR operator to sort a sequence by the value of a sub-node of the NodeSet - each sorting expression is evaluated for each item in the source expression and its values converted to be an operand for the ">" operator -sort order can be ASCENDING / DESCENDING - e.g.

FOR $NodeSetX IN document ("xxx.xml") //elementX[elementY ="xxx"]

RETURN { $NodeSetX }

SORTBY {elementZ DESCENDING}

  • TO - return a sequence range of nodes - Can be used in LET or FOR statements:

LET $NodeSetX := document("xxx.xml")//record[3 TO 10, 15 TO 20]

FOR $NodeSetX IN document("xxx.xml")//record[3 TO 10, 15 TO 20]


Datatypes

  • based on XML Schema - 5 categories: numeric, string, date, node types, sequences

  • Each datatype has its own constructor function that casts a string into the appropriate datatype - e.g.

<elementX>

{

LET $a := float(10.5)

LET $b := float(16.8)

RETURN { $a + $b}

}

</elementX >

  • note: the returned content between the curly brackets is automatically converted into a string

  • Objects can then be input into other expressions or saved to local variables - e.g.

{

LET $startDate := dateTime ("2003-10-10")

LET $interval := 7

LET $newDate := add-days ($startDate , $interval )

RETURN <date>

{ get-day($newDate)} /

{ get-month($newDate)}

{get-year($newDate)}

</date>


}

  • The numeric types & cast functions in order of lowest to highest precedence: integer, byte, int, short, long, decimal, float, double - comparisons convert to the operand with the highest precedence

  • numeric operators: = < > <= >= != + - * div mod - includes unary + -

  • the "/" character is reserved for XPath expression child nodes, thus div is used instead

  • Numeric functions: floor() , ceiling() , round(), abs()

  • Boolean functions: true() , false(), (make code more legible) , boolean-from-string()

  • Boolean functions: boolean-and() , boolean-or(), boolean-not(), not(), not3()

  • 0 is false, all other numbers are true: 3 and 0 = false

  • Dates = true

  • Empty string, empty NodeSets & empty sequences = false

  • The string types (- consist of Unicode character sequences) & cast functions :

    • String

    • NormalizedString - no leading or trailing spaces, all other spaces are 1 char length

    • Token - sequence of alphanumeric characters

    • Language - a sequence that matches the IETF language codes - e.g. "EN"

    • Name - a generalized XML name

    • NMToken - a token used as an Enum value

    • ID - a token used as an ID value

    • IDREF - a token used as an IDREF value

    • ENTITY - a sequence of characters used as an entity

  • String literals can be created by wrapping a sequence of characters in either single or double quotes

  • String comparison - not supported in XPath 1.0 - cannot test the predicate 'xxx' > 'yyy' - can do directly in XSLT via <xsl:sort>

  • Collation -set of rules to describe the relative ordering of 2 strings

  • the codepoint-compare() function - converts 2 strings to Unicode, then compares them - returns 0 if equal & 1 if different

  • The compare() function takes a 3rd argument that specifies a URL to a collation file

  • String functions:

    • Concat(s,s,s...)

    • Starts-with(s,s), ends-with (s,s)

    • Codepoint-contains(), contains()

    • Codepoint-substring(s,s), Substring()

    • String-length ()

    • Codepoint-substring-before (s,s), Substring-before(s,s, collation)

    • Codepoint-substring-after(s,s), Substring-after(s,s,collation)

    • Normalize-space(s), normalize-Unicode (s, normalizationForm)

    • Upper-case(s), lower-case(s)

    • Translate(s, s, s) - replace

    • Match(s, regex) - regular expression

    • Replace(s, regex, s, collation)

    • String-pad-beginning(s,i,s), String-pad-end(s,i,s)

  • Most XQuery work is done on raw untyped XML - regex can be useful

  • Datetime functions - pg 48-51


NodeSets & node operators

  • primary purpose of both XPath & XQuery is the retrieval of NodeSets of 0 or more nodes from a query - XQuery is thus very similar to & supersedes multi-join SQL queries

  • Node operators:

  • ==, !== - tests if 2 nodes do or don’t have the same functional identity - test for duplication of structure , not content values

  • =>- retrieve a node by its ID - left operator is an element or attribute with a value of type IDREF or IDREFS, while the right operand is a node test - the operator de-references the value & returns the referenced nodes that satisfy the node test

  • Is - proposed tests for specific node reference matches - that both point to the same source node

  • Node functions:

    • Local-name(node) - return node name

    • Namespace-uri(node) - return node namespace URI

    • Number(node) - return node value cast as a number

    • Node-equal(node, node) - return true if 2 nodes have the same identity

    • Value-equal(node, node) - return true if 2 nodes have the same value

    • Node-before(node, node)

    • node-after(node, node)

    • Copy(node) - return a deep copy of a node (all its attributes & descendants)

    • Shallow(node) - return a shallow copy of a node (all its attributes but not its descendants)

    • Boolean(node) - cast a node as boolean

    • If-absent(node, simpleType) - if node is an empty sequence, return the content of simpleType

    • If-empty(node, simpleType) - if node is an empty sequence or an element with empty content, return the content of simpleType

  • Note: copies break the implicit link to the source document


Sequence operators

  • a NodeSet is considered a sequence (an array)

  • Sequences allow for more complex data structures - a sequence of XML nodes is a collection of pointers to XML trees, each of which can be a sequence in itself

  • Sequence functions:

  • Position (item, sequence ) - returns integer

  • Last (sequence ) - returns integer

  • Item-at (sequence , decimal ) - return node at a given position

  • Index-of (sequence , type, collation) - return an array of node positions

  • Empty (sequence ) - return boolean

  • Exists (sequence ) - return boolean

  • Identity-distinct(sequence ) - return a NodeSet with all redundant duplicate elements deleted based on node identity

  • Value-distinct(sequence, collation) - return a NodeSet with all redundant duplicate elements deleted based on node value

  • Sort(sequence, collation) -

  • Reverse-sort (sequence, collation) -

  • Insert (sequence, decimal, type) -

  • Sublist-before (sequence, sequence, collation) - return part of the 1st sequence that occurs before the 1st occurrence of the 2nd sequence

  • Sublist-after (sequence, sequence, collation) -

  • Sublist (sequence, decimal, decimal)

  • Sequence-pad-beginning (sequence, decimal, type) -

  • Sequence-pad-end(sequence, decimal, type) -

  • Truncate-beginning (sequence, decimal) -

  • Truncate-end (sequence, decimal)

  • Resize-beginning (sequence, decimal, type) -

  • Resize-end (sequence, decimal, type)

  • Unordered (sequence ) - hint to query optimizer that sequence order is unimportant


Sequence logical operators

  • Sequence-value-equal (sequence, sequence, collation)

  • Sequence-node-equal (sequence, sequence, collation)

  • Union (sequence , sequence )

  • Union-all (sequence , sequence )

  • Intersect (sequence , sequence )

  • Intersect-all (sequence , sequence )

  • Except (sequence , sequence ) - inverse of intersection

  • Except-all (sequence , sequence )


Aggregate functions

  • perform simple math operations on a sequence

  • Count (sequence)

  • Avg (sequence)

  • Max (sequence, collation )

  • Min (sequence, collation )

  • Sum (sequence )


Referencing & filtering functions

  • allow retrieval of XML nodes from a location external to the current node - note: arguments can be sequences - e.g. a sequence of IDREF values

  • These functions establish relational links that are explicit in the data structure for organizational purposes

  • Id (IDREF) - return the NodeSet with matching unique IDs

  • Idref (string) - return the NodeSet with matching unique IDREFs

  • Filter (node) - return a NodeSet consisting of a shallow copy of nodes selected by the expression argument , preserving any interrelationships

  • The Filter() function operates on a tree hierarchy to narrow a query on a NodeSet or extract a summery of tree data

  • E.g. - keep hierarchy, but only the elements selected: Filter (document ("xxx.xml" //(elementX | elementY | elementZ/text() ))

  • Document (URIstring) - return a NodeSet consisting of the root node of the referenced document

  • E.g. use the id function to retrieve a specific set of elements :

LET $Root := document("xxx.xml")

LET $NodeSetX := $Root//id("3", "56")

FOR $ElementX IN $NodeSetX

RETURN

<elementY>

{ $ElementX /attributeX }

</elementY>

  • E.g. use the id function with IDREF types that are IDREF attributes defined in the structure schema

FOR $ElementX IN document("xxx.xml")

//ElementX

LET $NodeSetZ := $ElementX //id(@ElementZ)

RETURN

<elementY>

{ $ElementX /attributeX }

{ $NodeSetZ/attributeX }

</elementY>

  • The idref() function works in reverse - if given a specific IDREF string, it will return a list of all elements that have this IDREF - e.g. find all elements that reference the current element

FOR $NodeSetX IN document("xxx.xml")

//ElementX

RETURN

<elementX>

{ $NodeSetX /attributeX }

<elementY>

{

FOR $NodeSetY IN $NodeSetX //idref(@id)

RETURN

{ $NodeSetY /attributeY}

}

</elementY>

</elementX>


Some Use Cases:

Use case 1: querying a standard XML datasource to retrieve a flat hierarchy

  • Include an outer element to wrap results for it to be well formed XML & encase the XQuery statement in curly brackets to differentiate XML content from XQuery - XQuery expressions within curly brackets are converted into strings when the output is an attribute :

<elementX>

{

FOR $NodeSetX IN value-distinct(document ("xxx.xml")/elementX[@attributeX != "5"]

WHERE number($NodeSetX/@attributeX) > 90

RETURN <elementY attributeX=" {$NodeSetX/@attributeX} " />

}

</elementX>


Use case 2: querying a standard XML datasource to retrieve a tree hierarchy

  • E.g. filter NodeSet for matches with 3 possible XPath expression patterns - note: filter returns a shallow copy

<elementX>

{

LET $NodeSetX := document("xxx.xml")

RETURN

filter( $NodeSetX//elementY |

$NodeSetX//elementY/element Z |

$NodeSetX//elementY/element Z/text())


}

</elementX>

  • E.g. use the comprehensive XPath search // to locate all of a specific element type within a document

<elementX>

{

FOR $NodeSetX IN document("xxx.xml") //ElementX

RETURN

<elementY>

{ $NodeSetX /@*}

{ $NodeSetX /elementY}

{ count($NodeSetX /elementY)}

{ ($NodeSetX //elementY)[3]/elementZ}

{(($NodeSetX //* AFTER ($NodeSetX //elementY)[1]) BEFORE ($NodeSetX //elementY)[2])

</elementY>


}

</elementX>

  • // - the comprehensive XPath descendants search can locate all instances of an element within an XML tree

  • /@* - retrieve all attributes of the current source element & add them to a target container element

  • Attribute values are converted to strings to ensure syntax safety


Use case 3: relational querying

  • The temporal-datetime-contains function takes 3 arguments(start date, end date, current date) & tests to see if current date is between the two

  • The LET clause is used to enforce a PK to FK relationship between 2 XML documents

FOR $NodeSetX IN document("xxx.xml") //elementX

LET $NodeSetW := document ("www.xml") //elementW[element_FKID = $NodeSetX/element_FKID]

WHERE temporal-datetime-contains (

date (concat( "15", string( $NodeSetX /element_start_date))),

date ( "15" + string( $NodeSetX /element_end_date)),

date(currentDatetime()))

AND contains ($NodeSetX/elementY, "xxx")

RETURN

<elementY>

{$NodeSetX/elementA}

{$NodeSetW/elementB}

</elementY>

SORTBY (element_PKID)

  • e.g. to determine which items are in 1 XML document and not the other:

FOR $NodeSetX IN document("xxx.xml") //elementX

WHERE not (SOME $NodeSetY IN document("yyy.xml") //elementY SATISFIES $NodeSetX/elementPKID = $NodeSetY/elementFKID)


Use case 4: full text search

WHERE SOME $NodeSetY IN $NodeSetX// elementY SATISFIES contains( $NodeSetY /text(), "xxx")

OR SOME $NodeSetY IN $NodeSetX// elementY SATISFIES contains( $NodeSetY /text(), $NodeSetW /text())

OR SOME $NodeSetY IN $NodeSetX// elementY SATISFIES contains( string($NodeSetX) $NodeSetW /text())


Use case 5: references between XML documents

  • XML documents can reference each other through the mechanism of IDs & IDREFs

  • The dereferencing operator "->" is used to follow a referential link from an IDREF pointer attribute to a specified element's ID attribute (has a unique value for that attribute)

  • E.g.

<xxx>

{

FOR $NodeSetX IN document ("xxx.xml")// elementX[attributeX = "xxx"], $NodeSetY IN $NodeSetX/@attrib_IDREFy->elementY

WHERE $NodeSetX/elementY/elementZ = "xxx"


RETURN shallow ($NodeSetX/@attrib_IDREFx->elementX ),


}

<xxx>


Comparison of SQL to XQuery

  • Designed with different data models (unordered relational vs. portable XML hierarchical fragments )

  • While SQL is more powerful it is bound to a host RDBMS (with the exception of point to point linked queries ), whereas XQuery can formulate queries which act on XML fragments (or RDBMS that can natively expose XML) distributed all over the internet

  • Key differences:

  • SQL operates on tuple-sets from unsorted tables while XQuery operates on simple datatypes , documents, document fragments & document sequences

  • SQL works on tuples of attributes (rows with well defined columns), while XQuery works on XML nodes

  • SQL has no inherent order, with sorting explicitly specified in the query, while XQuery returns data in the order defined in the source documents or specified in the query

  • While SQL relational data is expressed through DDL and used to validate each DML statement , the minimal structure requirement for XQuery data is a fragment of well formed XML, optionally validated with a XML Schema

  • While SQL supports commands to insert, delete & update tuples , XQuery can only read data

  • While SQL can define a view as a query, XSLT can transform a document over which XQuery can run

  • XQuery is suited to the portable and hierarchical XML data model which is more concise when describing complex data structures - unlike SQL, where a minor change in data structure may require a large redesign

  • Compared to XQuery FLWR statements, SQL is less flexible than XQuery because it only deals with tuple sets

  • Always return a NodeSet or simple datatypes

  • All FOR, LET, WHERE & RETURN clauses take expressions as arguments - can include other FLWR expressions - No limit to chaining FLWR expressions, & can handle complex data structures by iterating over a sequence & binding each of its items to a variable

  • A FLWR expression can have multiple FOR & LET clauses, allowing the definition of multiple variables to be manipulated in the WHERE & RETURN clauses or even other FOR or LET statements

  • A FLWR expression is not a program - it is a bindable template of instructions

  • Path’s role in XQuery is equivalent to that of SQL FROM & WHERE

  • The basic XPath construct is the location path- an expression resembling a readable URI which identifies a portion of an XML document

  • XPath sees the document structure as a tree of nodes, including comments & processing instructions

  • / - points to the root node -returns the whole document

  • //elementX - return a NodeSet of elements no matter where they are in the tree

  • //elementX[6] -

  • //elementX [@attributedX='xxx'] - equivalent to above with a SQL WHERE clause

  • //elementX [//elementY/@attributedY='xxx']/@attributeX - select an attribute instead of an element

  • An XPath location axis selects nodes to narrow the search - absolute axis (prefixed with "/") starts from the root node, while relative axis starts from the context node

  • XPath offers 13 axes: child, parent, self, descendant, ancestor, descendant-or-self, ancestor-or-self, following, preceding, following-sibling, preceding-sibling, attribute, namespace

  • XPath axis selection , node selection & predicate evaluation make up a node testing step

SELECT DISTINCT tableX.colX

FROM tableX

WHERE tableX.colY < 0

XPath equivalent:

//elementX[//elementY/number()>0]/@number


SELECT tableX.colX

FROM tableX

GROUP BY colX

HAVING SUM(colY) = 0

XPath equivalent:

//elementX[count(elementY)=0]/@number


SELECT SUM(tableX.colX)

FROM tableX

XPath equivalent:

Sum (number(//elementY))

  • composing new elements - the equivalent to SQL creating new output columns is using XQuery to compose new XML elements in the query output:

SELECT COUNT(*) AS colNew

FROM tableX

...

<elementNew>

{

count(document("xxx.xml")//elementX

}

</elementNew>

  • E.g. replacing a SELECT statement with a WHERE clause that has multiple AND clauses - use a variable that assumes every possible value in the NodeSet:

FOR $NodeSetX IN document("xxx.xml")//elementX[elementY/attribX='x']

RETURN

<elementX>

<elementA> { string($NodeSetX/elementY/@attributeX) } </elementA>

<elementB>

{

$NodeSetX/elementY/@attributeY

}

</elementB>

</elementX>



Comparison of XSLT to XQuery

  • Both can process multiple XML datasource document trees & generate a presentation ready document by applying transformations according to pattern matches identified by XPath in the source XML Document

  • XQuery has better mechanisms to work with structured & distributed XML data

  • XSLT provides more refined transformation mechanisms, while XQuery allows simple expressions to query structured & distributed XML data

  • XSLT template rules easily describe changes to a document - XQuery requires cumbersome recursion for complex transformations

  • XSLT can create global variables visible to the whole stylesheet, while XQuery variables are bound to an expression result

  • XSLT can generate XHTML while XQuery can only generate results on the XQuery/XPath model

  • XSLT can recursively apply multiple templates and automatically sort precedence of instructions

  • XQuery is more suited for handling data with strict & repeated formatting in contrast to XSLT's ability to easily describe minor updates in a document

  • XQuery can apply paths on multiple source documents & easily compare or apply set operations on the results - while XSLT can deal with multiple documents, it is difficult to do so simultaneously

  • XQuery can define functions that can be used in XPath expressions, while XSLT can only provide callable templates

  • XQuery can match text across element boundaries, while XSLT can only compare contents of single elements & attributes

  • XQuery can perform set operations on the result of path operations (XSLT applies templates on each element that matches a pattern , but can not filter a NodeSet by comparing it with other node sequences)

  • XQuery can express subsets of items of a path, while XSLT, which uses XPath 1.0, which can only refer to sibling items in a path

  • XQuery is best suited for extracting XML fragments from a document , not for transformation of whole documents, adding markup or global restructuring

  • While it is possible to do transformation using XQuery, it is complicated -use XQuery to extract relevant data from huge documents & generate a small XML resultset, then apply XSLT to transform

  • XQuery influence in XSLT 2.0

    • simple manipulation of XML Schema data type content (handled by XPath 2.0) , string content

    • Improved ease of use, interoperability, 1i8n support ,

    • Data grouping support

    • Direct use of IDREF attributes to select paths

    • Note: each language has a clear role - XSLT is for transformation to produce presentation , and not for generation of structured resultsets

Posted on Thursday, October 9, 2014 5:32 AM | Back to top


Comments on this post: The Power of XSLT and XQuery

# re: The Power of XSLT and XQuery
Requesting Gravatar...
It is a great chance to learn this new information now. - Dr. Thomas G. Devlin MD, PhD
Left by Robert Jacob on Dec 28, 2016 12:53 PM

Your comment:
 (will show your gravatar)


Copyright © JoshReuben | Powered by: GeeksWithBlogs.net