Hi Norbert!
Well, sort of. I think I'm starting to get it.
defineElement: and
defineCData: are meant that you get the content _from_ the element or
cdata? So defineCData: extracts the text from the Node?
Yes that's the idea. #defineElement: and #defineCData: have in common that
both create an new object. While uses not data from the node at all,
#defineCData: gets the string-value and hands it over to the constructor.
(*) The node at which a type is applied can be viewed as a pivot node from
where further processing, namely the types mappings, starts. Thus, it
serves as context node for the xpath-expressions.
(*) Actually, the given class name can refer to any binding. In VW this
can be classes (of course), Shared Variables and namespaces as well.
From that point of view I think that Cdata is the
wrong name anyway.
CData and PCData are mainly present in the textual form of XML.
I see your point. =) Perhaps we can go even further and use #defineNode:,
because a type can be applied not only to elements, but all kinds of xml
nodes. And how about #defineStringValue: instead of #defineText:?
But there's is actually another type, available via #defineStruct:. It
behaves similar to the element type but requires that the created objects
respond to #at:put:. The default class is Dictionary here. Remember the
rectangle example:
builder defineStruct: 'Rect')
mapPath: ('pos' /@ 'x') toType: 'Int';
"... and so on"
Since we use struct now, we get:
(Dictionary new)
at: 'x' put: 2;
"... and so on"
This third (and last type) proofed to be very useful for rapid prototyping
of a mapping. Later in development, the structs can be easily replaced be
the actual domain objects.
Did you consider instead of having a constructor:
setter to provide a
block instead. Usually this adds a little complexity but opens a whole
set of possible use cases.
Do you think of something like a factory block that takes a node and
produces an new object? E.g. something like
(defineFactory: 'Hypothetical' block: [:node | node copy])
This idea looks promising. If it takes the context node, there are indeed
plenty of new use cases. =) Perhaps we should allow here the binding
approach too, to avoid wrapping existing facilities in blocks, e.g.
(defineFactory: 'Hypothetical' class: 'MyFactoryClass' call:
#processNode:)
This would enable to call SimpleXO parsers for specific nodes and thus a
more modular design in complex situations...
[Example on ID
resolving]
That sounds interesting but I don't get the example. Can you elaborate
on this? Or provide a more concrete example.
The basic idea is the following: A document may contain nodes with ids and
other nodes that refer to them by their id. To parse this, we first put
all elements with ids in a dictionary at the respective key. Setting
referenced values is delayed until all nodes has been parsed. This allows
forward references. In fact, a document in general, may have several
categories of ids, e.g. the attributes 'id' and 'domain:id'. Thus we want
to have a separate keychain for each category. When we call #key: or
#reference:, the argument is the name of a keychain.
<ex>
<list>
<geo ref="1"/>
<geo ref="2"/>
</list>
<geo id="1">
<comment value="1"/>
<rect>
<pos x="2" y="3"/>
<width>4</width>
<height>5</height>
</rect>
</geo>
<geo id="2">
<comment value="2">
<rect>
<pos x="6" y="7"/>
<width>8</width>
<height>9</height>
</rect>
</geo>
<comments>
<comment cid="1">First Rectangle</comment>
<comment cid="2">Second Rectangle</comment>
</comments>
</ex>
Now consider:
rect := builder defineElement: 'Rect' class: 'Rectangle'
rect
mapPath: ('pos' /@ 'x') toType: 'Int';
"... and so on"
(rect mapPath: (ParentAxis /@ 'id') toType: 'Int')
key: 'geo-keychain'.
(rect mapPath: (ParentAxis / 'comment' /@ 'value') toType: 'Int')
reference: 'comment-keychain'.
comment := builder defineCData: 'Comment'.
(comment mapPath: (AttributeAxis ? 'cid') toType: 'Int')
key: 'comment-keychain'.
doc := builder defineElement: 'doc' class: 'Set'.
(doc mapPath: ('ex' / 'list' / 'geo' /@ 'ref') toType:
'Int')
reference: 'geo-keychain';
setter: #add:.
(doc mapPath: ('ex' / 'geo' / 'rect') toType: 'Rect')
transient.
(doc mapPath: ('ex' / 'comments' / 'comment') toType:
'Comment')
transient.
Ignoring my potential typos, we get:
(Set new)
add: ((Rectangle new "...") comment: 'First Rectangle');
add: ((Rectangle new "...") comment: 'Second Rectangle').
Please note #transient in the doc's definition. This setting is used to
parse the matched nodes but without setting the created objects in their
parent. When configuring an id via #key:, the mapping is transient by
default, since we rarely want to preserve the xml ids.
Although this example is a bit bigger, I think it illustrates how SimpleXO
manages to map a complex xml straight to a much simpler object tree.
Hope this gives you further insights!
Best regards,
Steffen
PS: Using the external DSL, the mapping can be written as follows:
'element Rect {
class: Rectangle
pos/@x >> Int
#... and so on
../@id >> Int (key: geo-keychain)
../comment/@value >> Int (ref: comment-keychain)
}
cdata Comment {
@cid >> Int (key: comment-keychain)
}
root element Doc {
class: Set
ex/list/geo/@ref >> Int (ref: geo-keychain setter: #add:)
ex/geo/rect >> Rect (transient)
ex/comments/comment >> Comment (transient)
}'