It seems I've produced a couple of typos in the mail before. At least this needs to be clearified:
Yes that's the idea. #defineElement: and #defineCData: have in common that both create an new object. While #defineElement: uses no data from the node at all, #defineCData: gets the string-value and hands it over to the constructor.
Ciao, Steffen
Am 09.11.2011, 22:25 Uhr, schrieb Steffen Märcker merkste@web.de:
Hi Norbert!
Well, sort of. I think I'm starting to get it. defineElement: and defineCData: are meant that you get the content _from_ the element or cdata? So defineCData: extracts the text from the Node?
Yes that's the idea. #defineElement: and #defineCData: have in common that both create an new object. While uses not data from the node at all, #defineCData: gets the string-value and hands it over to the constructor. (*) The node at which a type is applied can be viewed as a pivot node from where further processing, namely the types mappings, starts. Thus, it serves as context node for the xpath-expressions.
(*) Actually, the given class name can refer to any binding. In VW this can be classes (of course), Shared Variables and namespaces as well.
From that point of view I think that Cdata is the wrong name anyway. CData and PCData are mainly present in the textual form of XML.
I see your point. =) Perhaps we can go even further and use #defineNode:, because a type can be applied not only to elements, but all kinds of xml nodes. And how about #defineStringValue: instead of #defineText:?
But there's is actually another type, available via #defineStruct:. It behaves similar to the element type but requires that the created objects respond to #at:put:. The default class is Dictionary here. Remember the rectangle example:
builder defineStruct: 'Rect') mapPath: ('pos' /@ 'x') toType: 'Int'; "... and so on"
Since we use struct now, we get:
(Dictionary new) at: 'x' put: 2; "... and so on"
This third (and last type) proofed to be very useful for rapid prototyping of a mapping. Later in development, the structs can be easily replaced be the actual domain objects.
Did you consider instead of having a constructor: setter to provide a block instead. Usually this adds a little complexity but opens a whole set of possible use cases.
Do you think of something like a factory block that takes a node and produces an new object? E.g. something like
(defineFactory: 'Hypothetical' block: [:node | node copy])
This idea looks promising. If it takes the context node, there are indeed plenty of new use cases. =) Perhaps we should allow here the binding approach too, to avoid wrapping existing facilities in blocks, e.g.
(defineFactory: 'Hypothetical' class: 'MyFactoryClass' call: #processNode:)
This would enable to call SimpleXO parsers for specific nodes and thus a more modular design in complex situations...
[Example on ID resolving]
That sounds interesting but I don't get the example. Can you elaborate on this? Or provide a more concrete example.
The basic idea is the following: A document may contain nodes with ids and other nodes that refer to them by their id. To parse this, we first put all elements with ids in a dictionary at the respective key. Setting referenced values is delayed until all nodes has been parsed. This allows forward references. In fact, a document in general, may have several categories of ids, e.g. the attributes 'id' and 'domain:id'. Thus we want to have a separate keychain for each category. When we call #key: or #reference:, the argument is the name of a keychain.
<ex> <list> <geo ref="1"/> <geo ref="2"/> </list> <geo id="1"> <comment value="1"/> <rect> <pos x="2" y="3"/> <width>4</width> <height>5</height> </rect> </geo> <geo id="2"> <comment value="2"> <rect> <pos x="6" y="7"/> <width>8</width> <height>9</height> </rect> </geo> <comments> <comment cid="1">First Rectangle</comment> <comment cid="2">Second Rectangle</comment> </comments> </ex>
Now consider:
rect := builder defineElement: 'Rect' class: 'Rectangle' rect mapPath: ('pos' /@ 'x') toType: 'Int'; "... and so on" (rect mapPath: (ParentAxis /@ 'id') toType: 'Int') key: 'geo-keychain'. (rect mapPath: (ParentAxis / 'comment' /@ 'value') toType: 'Int') reference: 'comment-keychain'.
comment := builder defineCData: 'Comment'. (comment mapPath: (AttributeAxis ? 'cid') toType: 'Int') key: 'comment-keychain'.
doc := builder defineElement: 'doc' class: 'Set'. (doc mapPath: ('ex' / 'list' / 'geo' /@ 'ref') toType: 'Int') reference: 'geo-keychain'; setter: #add:. (doc mapPath: ('ex' / 'geo' / 'rect') toType: 'Rect') transient. (doc mapPath: ('ex' / 'comments' / 'comment') toType: 'Comment') transient.
Ignoring my potential typos, we get:
(Set new) add: ((Rectangle new "...") comment: 'First Rectangle'); add: ((Rectangle new "...") comment: 'Second Rectangle').
Please note #transient in the doc's definition. This setting is used to parse the matched nodes but without setting the created objects in their parent. When configuring an id via #key:, the mapping is transient by default, since we rarely want to preserve the xml ids.
Although this example is a bit bigger, I think it illustrates how SimpleXO manages to map a complex xml straight to a much simpler object tree.
Hope this gives you further insights!
Best regards, Steffen
PS: Using the external DSL, the mapping can be written as follows:
'element Rect { class: Rectangle pos/@x >> Int #... and so on ../@id >> Int (key: geo-keychain) ../comment/@value >> Int (ref: comment-keychain) }
cdata Comment { @cid >> Int (key: comment-keychain) }
root element Doc { class: Set ex/list/geo/@ref >> Int (ref: geo-keychain setter: #add:) ex/geo/rect >> Rect (transient) ex/comments/comment >> Comment (transient) }' _______________________________________________ Moose-dev mailing list Moose-dev@iam.unibe.ch https://www.iam.unibe.ch/mailman/listinfo/moose-dev