---- On Thu, 04 Mar 2010 12:35:39 -0800 Alexandre Bergel alexandre@bergel.eu wrote ----
Hi,
I exchanged a number of emails with Jaayer and Norbert regarding some improvements of XMLSupport and its port to Gemstone. It may be a bit difficult for people to follow this, but I think it is important to not discuss privately.
I already changed
XMLTokenizer>>nextName .... ^ self fastStreamStringContents: nameBuffer
to
XMLTokenizer>>nextName .... ^ (self fastStreamStringContents: nameBuffer) asSymbol
in the gemstone parser to be more consistent.
Have you noticed any slow down for this?
No I didn't do any tests. But if internally all names are symbols than I guess converting it while reading is the best way to do.
I added benchmark1 in XMLParserTest. Really simple way to measure progress (or slowdown). On my machine, I have: XMLParserTest new benchmark1 => 2097
Adding "(self fastStreamStringContents: nameBuffer) asSymbol" increase the bench to 2273
I don't believe this ;) you read them as string from the stream. If they are managed as symbols somehow they need to be converted. If not at this place then on some other. I would suspect that there are doubled calls to asSymbol. Could you check the sources?
There is indeed a slowdown. I am not sure where it comes from however. Executing twice "XMLParserTest new benchmark1" does not return the same result. Actually, it increases at each execution! I thought that a garbage collect before running the bench would help, does apparently it does not.
Calling asSymbol on a symbol should not be perceptible I believe.
Cheers, Alexandre
You should run those benchmarks longer, perhaps 600 times instead of 300, to get a more stable result. I loaded your most-recent package into a clean image and got similar results to what you got, with the current non-converting version being slightly faster. However, in my development image (with all of the changes I have made since my last release), the converting version is slightly faster, and both versions are overall faster. I haven't been able to work much on the parsers and tokenizer yet, but it appears they are still largely string-based, so I am not sure if making changes like this is good idea at this point.