I have implemented a simple preprocessor that can be combined in any possible way with petitparser.
You can get it on executing the following script (requires a recent version of Petitparser):


Gofer new
    squeaksource3: 'PetitPreprocessor';
    package: 'PetitPreprocessor';
    load

It transforms any matching regex into a provided string.

example (preprocessor removes T):
'Foo' asParser preProcess: ('T' asRegex) into: ''

I successfully parsed a procedural language with this preprocessor. Because of the column width limitation in the parsed language, the parser could encounter a carriage return at any position, followed by the line number (it's a really old language), then a quote followed by the rest of the statement. So I used a regular expression to detect these bizarre constructs to remove them from my input stream.

I wrote this tool because I needed to know the starting and ending position for parsed expressions in the original stream (not preprocessed). It can transform the stream and match my grammar and therefore provide me the real position relative to the original stream simply by sending info to a parser.


Example:

(parser1, ('Foo' asParser) info , parser2) preprocess: myRegex into: myReplacement

It works in the same way as token, providing an object containing the relative start and stop position for the parsed data (even into one or multiple preprocessors of course) so if in the example parser1 consumes altered data, you will get the right position in the original stream for your parser.

This solution has been applied on real case and has proved to be really useful to link parsed preprocessed data (function source code) to the filesystem.

Feel free to correct any bug you detect or to propose new functionality.

--
Guillaume Larcheveque