I have implemented a simple preprocessor that can be combined in any possible way with petitparser.
You can get it on executing the following script (requires a recent version of Petitparser):
Gofer new
squeaksource3: 'PetitPreprocessor';
package: 'PetitPreprocessor';
load
It transforms any matching regex into a provided string.
example (preprocessor removes T):
'Foo' asParser preProcess: ('T' asRegex) into: ''
I successfully parsed a procedural language with this
preprocessor. Because of the column width limitation in the parsed
language, the parser could encounter a carriage return at any
position, followed by the line number (it's
a really old language), then a quote followed by the rest of the
statement. So I used a regular expression to detect these bizarre
constructs to remove them from my input stream.
I
wrote this tool because I needed to know the
starting and ending position for parsed expressions in the original
stream (not preprocessed). It can transform the stream and match my
grammar
and therefore provide me the real position relative to the original
stream simply by sending info to a parser.
Example:
(parser1, ('Foo' asParser) info , parser2) preprocess: myRegex into: myReplacement
It works in the same way as token, providing an object
containing the relative start and stop position for the parsed data
(even into one or multiple preprocessors of course) so if in the example
parser1 consumes altered data, you will get the right position in the
original stream for your parser.
This solution has been applied on real case and has proved to be really
useful to link parsed preprocessed data (function source code) to the
filesystem.
Feel free to correct any bug you detect or to propose new functionality.
--
Guillaume Larcheveque