a single class called PP...Grammar. However, the Java
grammar has many rules and
including all of them in a single class seems not the right approach.
Sure, depending on the size and structure of the grammar you might
want to split it into multiple classes.
For example, now I have a class called PPJavaLexicon,
in which I cover the rules
for finding tokens and comments (i.e. the lexical structure [1]).
Then, for example,
I would continue working on types, values, and variables [2]. So, I would
create another class that references PPJavaLexicon and uses the
rules defined there to define the new ones. Something like:
PPJavaTypes>>typeVariable
^ppJavaLexicon identifier
Yes, that's a possibility that works well. Maybe better use the
accessor #productionAt: to access the cached productions of a
different grammar, otherwise you end up with much larger grammars than
necessary.
Is this a good approach to split a grammar in more
classes,
or would you suggest something different?
The problem of splitting up the grammars as you propose is that it is
not that easy anymore (but still possible) when you want to use
subclassing to customize the grammar with different production
actions.
Another (and more traditional approach) is to use a separate lexer:
You can see that in TextLint (check on
squeaksource.com). There we
have different lexers for plain text, LaTeX and HTML; and a parser for
a 'natural language' of words, sentences, and paragraphs (very simple)
that can be composed in different ways. For Java such a split probably
doesn't make sense, but it is a good example of PetitParser being very
flexible to different requirements.
Also you might want to look at my work on language embedding,
especially <http://scg.unibe.ch/archive/papers/Reng09cLanguageBoxes.pdf>.
There we programmatically compose different languages modeled as
PPCompositeParser instances at specific join-points.
Lukas
--
Lukas Renggli
www.lukas-renggli.ch