[Moose-dev] Re: The future of Opax in XML-Support

17 Nov 2010

      ---- On Tue, 16 Nov 2010 02:27:27 -0800 Tudor Girba  wrote ----
...
Hi,
On 15 Nov 2010, at 15:29, jaayer wrote:
...
---- On Mon, 15 Nov 2010 00:45:04 -0800 Tudor Girba wrote ----
...
Hi,
Thanks for this nice overview. I am happy that XML handling gets a bit more traction in Smalltalk :). I took a quick look and your XMLPluggableElementFactory sounds quite interesting, and it's great that it supports namespaces.
Regarding Opax, your analysis is not quite right.

You do not need to subclass the OPOpaxHandler.

Really? So if I have a pre-existing SAX parser, say SVGSAXParser, there is a way to make it support Opax-like functionality without changing its superclass to be that of OPOpaxHandler?
:). No, and you are not supposed to. The reason for subclassing the SAXHandler is to accommodate the stream of XML nodes in methods like startElement:... . The OPOpaxHandler overrides these methods and creates corresponding nodes and dispatches to them the handling.
Looking back at my earlier posts, I realize now that I wasn't clear. I understand you can use OPOpaxHandler direcly without subclassing it, and that you really only need to subclass OPGenericElement and override #xmlTags for the magic to happen. What I was referring to was a situation where you wanted to extend another, already-existing SAX parser to have Opax-like functionality. Take a hypothetical pre-existing SAXDocBookParser as an example; you would have to change it to inherit from OPOpaxHandler rather than SAXHandler or the SAXHandler subclass it presently inherits from and probably do some overriding and super sending of the handlers to make it work with Opax properly.
Another potential problem with Opax is that it blindly enumerates *all* subclasses of OPGenericElement looking for any that expresses interest in handling a particular element by inspecting the #xmlTags collection of each. Other than the lack of caching and the performance degradation this will result in as more subclasses of OPGenericElement are added, there is the issue of conflict resolution. What if you add a subclass of OPGenericElement named UsernameElement whose #xmlTags collection contains "name," and then I add a NameElement class whose #xmlTags collection also contains "name"--which class should be used? What if I don't add my own NameElement class or any class with "name" in its #xmlTags collection at all because I want "name" elements ignored altogether; they'll still get handled anyway, because your UsernameElement class is still there and its #xmlTags collection still contains "name."
...
...
...

The goal of Opax is not to replace DOM, but to enhance SAX. It's true that at the moment it still creates a tree, but this should be changed to make it optional. The original idea of Opax was to dispatch everything, including the factory decision to the Element, but the implementation remained behind the wishes.

To be perfectly honest with you, I did not before nor do I now fully understand what Opax is supposed do. I understand that at the very least it involves mapping elements in an XML document to different kinds of objects, but how it is ultimately supposed to go about doing this remains unclear and appears to still be in flux.
It's the same as with your XMLDOMParser: you do not subclass it, you just parameterize it. In your case, the parametrization is quite nice.
...
...

Opax is tiny: 3 classes + 4 test classes

True, but it takes up two top-level class categories and still adds more weight to the package, and by your own admission it stands to only get bigger.
...

OPGenericElement should simply be made a subclass of XMLElement, and we would have the compatibility we would need.

Right, but then it would be a DOM node, and you said you wanted Opax to avoid DOM, or at least the DOM parser.
Yes and no. It would be a DOM node, but this does not mean that we have to store all of them in a tree if I do not need them.
...
...

I do not see the reasons why DOM should be preferred to SAX. The problem with DOM is that it always creates XML elements :). When you have large XML files, you often do not want to load them, but just to process them directly. This is the goal of SAX, but then SAX is procedural. Opax should be used to transform SAX into an object-oriented handling.

So the goal is something that only produces objects for certain portions of a document, but ignores the rest? I think this could be better built on top of the DOM parser, perhaps as a partial DOM parser.
With the new Factory, the DOMParser is similar to what the OPOpaxHandler is doing. The difference is that the Factory is doing the mapping, while in Opax the mapping is done on the class side of the element.
...
...
Instead of removing it, I would suggest a different approach. Let's make it focus on the SAX parsing:

We could easily get it to use the XMLNodeFactory
We could subclass OPGenericElement from XMLElement.

I think an approach that used more metaprogramming and dependency injection rather than inheritance would be better. Maybe something that uses reflection to query injected classes to be used for elements and then fills their instance variables based on the names of those variables and the names of the child elements and attributes that the elements the class has been mapped to contain. In other words, you wouldn't need to subclass OPGenericElement OR XMLElement; just have instance variables in the injected class with names matching, roughly, the attribute and child element names of the elements the class has been mapped to. You could also support explicit conversion instructions. For example, something that could be told to map "timestamp" elements to the DateAndTime class and to convert their content using fromString:.
I do not see what would be gained with this because I do not see for what else I could use these classes. In any case, I would not go the path of playing with instance variables as long as there are simpler ways.
All in all, I think that the DOM starts to do a good job at creating a tree. I would propose in moving Opax towards an on-the-fly analysis of the parsed tree but without storing it (you basically most of the time only need the current stack = the path to the root).
Cheers, 
Doru
-- 
www.tudorgirba.com
"Sometimes the best solution is not the best solution."

Moose-dev mailing list 
Moose-dev@iam.unibe.ch 
https://www.iam.unibe.ch/mailman/listinfo/moose-dev

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

[Moose-dev] Re: The future of Opax in XML-Support