Hi all,
From my work in the Delphi parser, and from my
experience as java programmer, I will try to add some thoughts to the discussion.
First of all, I am still struggling to get the code for Delphi clean. I try to do this by
sticking to some rules in converting the "standard petit parser result" to AST
nodes. I follow the following rules:
1. when I have an "or" rule ( / or | ), I do not create an AST node. I only
create an AST node for the sub rules. I.E. annotationTypeElementDeclaration gets no
concrete AST node. I sometimes create a abstract AST Node with subclasses for the sub
rules.
2. when I have a rule that has some syntax around the interesting part, I simple cut out
the syntax. i.e. arguments.
3. when I have an list of things, I give back an array of the things. I.e.
expressionList. Note that in the Delphi parser, we have some helper method to do this. For
a complete list, see PPDelphiLexicon convenience category, and PPDelphiParser convenience
category
4. For most other parse rules, I create an AST Node, that makes sense out of the tokens.
This node is create with a onTokens: because I do not want too much interpretation in my
grammar. And counting in an array is not very readable. The AST Node is always named after
the rule in the grammar.
Secondly there are some semantic rules in Delphi I do not incorporate into my AST. For
instance: in some cases an identifier is defined, and sometimes used. But in generating my
AST, I do no attempt at all to make a difference between them. An identifier is simple a
string that follows certain rules, and if it is a valid variable or constant or whatever,
I do not care at that point. When I am generating FAMIX, I try to match names, so I can
connect all things.
So this means I do not parse the Delphi the same way Delphi does, and do not intend to
make an attempt to do so. Because my goal is not to generate code, but to generate an
analysis model. A lot of rules I make are therefore more forgiving than a Delphi parser
would be. I.E. A Delphi parser will generate an error when it encounters a call to an
undeclared function. I will not do so. Also I want to include all variants of the various
defines ({$IFDEF}).
Finally, I know java is pretty well defined with a good language spec. But when I would
make a java parser, I will sometimes merge rules and/or break up rules, especially if I
use petit parser. We did find some spec on Delphi that was fairly complete, that we took
as base. But while parsing, we modified some of the rules, so it would make a better fit.
I.E. we had several type of identifiers, we all cast back to identifier. So my advice
would be not to make the language spec of java too holy, but to make the syntax and
grammar rules as readable as possible, so it will be easy to match to the java spec.
Diego