Fwd: [vwnc] Introducing SimpleXPath - Moose-dev

19 Dec 2011

Hi,
during my work on the XML-Mapping framework SimpleXO, I realized that the
XML querying code could be useful standalone, too. I factored out a
library named SimpleXPath and made it available in Cincom public
repository under the MIT license. It is similar to the XPath location path
subset (without predicates) but offers some distinct features:
- paths are built as pure Smalltalk expressions
- extended wildcard support
- simple API
Example:
(RootStep // 'source' /@ 'id')  "XPath: //source/@id"
        contextNode: anXmlNode;
        nodesDo: [:node | Transcript show: node stringValue; cr].
The above code prints the 'id' value of all 'source' elements in the XML
document from which anXmlNode is taken.
I am interested in your opinions. I'd be glad If you give it a try and
discuss your thoughts here. Below I've attached the package comment
explaining the API, just in case. ;)
Regards and happy coding!
Steffen
Simple XPath is an XML query library based on a subset of the XPath 1.0
language. It provides a handy API to construct paths and a parser for
abbreviated XPath location paths without predicates.
See also: http://www.w3.org/TR/xpath/.
I. NodeSets
-----------------
The result of constructing a path or parsing an XPath location path is a
NodeSet. If applied to an XML node, a NodeSet provides access to the nodes
selected by this set.
1. Call #contextNode: to define the node a NodeSet is applied to.
2. Call
        #nodes to get a set of all matched nodes,
        #nodesDo: with a one argument block to iterate over all matched nodes and
        #selectNodes: with a one argument block to select some of the matched
nodes.
If you are working with tags that have prefixed names, ensure that you
resolve the associated namespace before using a NodeSet.
Call >>#resolveNamespaces: with a dictionary that maps all prefixes to
their namespace.
II. Path construction API:
------------------------------------
To construct a path programmatically, use the Axis classes and the methods
   from the protocol "path construction".
1. Single path steps:
        ChildAxis ? 'name'.
"select all child nodes tagged with 'name'"
        ChildAxis ? ('prefix' + 'name').
"select all child nodes tagged with
'prefix:name'"
        AttributeAxis ? 'id'.
"select all attribute nodes tagged with 'id'"
        SelfAxis ? AnyNodeTest.                         "select the context node
itself"
        DescendantOrSelfAxis ? CommentTest.     "select all descendant comment
nodes"
2. Concatenate steps with #/ :
        (ChildAxis ? 'name') / (ChildAxis ? ('second' + 'name')).
        (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
        "Often, the axis can be omitted:"
        'name' / ('second' + 'name').
"same as"
                (ChildAxis ? 'name') / (ChildAxis ? ('second' +
'name')).
        AnyNodeTest / (AttributeAxis ? 'id').           "same as"
                (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
        "Similar to XPath, #/@, #// and #//@ abbreviate attribute and
descendant-or-self steps:"
        AnyNodeTest /@ 'id'.                                    "same
as"
                (ChildAxis ? AnyNodeTest) / (AttributeAxis ? 'id').
        'name' // CommentTest.                                  "same
as"
                (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) /
(ChildAxis
? CommentTest).
        'name' //@ 'id'.
"same as"
                (ChildAxis ? 'name') / (DescendantOrSelfAxis ? AnyNodeTest) /
(AttributeAxis ? 'id').
3. Query from the document root with a RootStep:
        RootStep // AnyNodeTest.                                "all nodes"
        RootStep //@ 'id'.                                              "id
of each node"
4. Create the union of two NodeSets with #| :
        (RootStep // 'element') | (RootStep // CommentTest).
        "#\@ abbreviates the union with an attribute step:"
        CommentTest \@ 'id'.                                    "same
as"
                (ChildAxis ? CommentTest) | (AttributeAxis ? 'id').
5. The wildcards # and * match single and multiple characters in local tag
names:
        ChildAxis ? 'name_##'.                                  "selects e.g.
<name_01 />"
        AttributeAxis ? '*_id'.                                 "selects e.g.
... svg_id='0x5' ..."
        "NOTE: XPath allows * only for the whole tag name, e.g. //prefix:* "
III. Parser API:
--------------------
To parse an abbreviated XPath location path, use SimpleXPathParser.
However, predicate expressions are not supported.
Call
        #parseString: with the XPath string to parse that string and obtain a
NodeSet and
        #validateString: to check whether the string is free of syntax errors.
If parsing fails, a SyntaxError is raised that gives the error position
and a brief description.
_______________________________________________
vwnc mailing list
vwnc(a)cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc