I have implemented a simple preprocessor that can be combined in any
possible way with petitparser.
You can get it on executing the following script (requires a recent version
of Petitparser):
Gofer new
squeaksource3: 'PetitPreprocessor';
package: 'PetitPreprocessor';
load
It transforms any matching regex into a provided string.
example (preprocessor removes T):
'Foo' asParser preProcess: ('T' asRegex) into: ''
I successfully parsed a procedural language with this preprocessor. Because
of the column width limitation in the parsed language, the parser could
encounter a carriage return at any position, followed by the line number
(it's a really old language), then a quote followed by the rest of the
statement. So I used a regular expression to detect these bizarre
constructs to remove them from my input stream.
I wrote this tool because I needed to know the starting and ending position
for parsed expressions in the original stream (not preprocessed). It can
transform the stream and match my grammar and therefore provide me the real
position relative to the original stream simply by sending *info* to a
parser.
Example:
(parser1, ('Foo' asParser) info , parser2) preprocess: myRegex into:
myReplacement
It works in the same way as token, providing an object containing the
relative start and stop position for the parsed data (even into one or
multiple preprocessors of course) so if in the example parser1 consumes
altered data, you will get the right position in the original stream for
your parser.
This solution has been applied on real case and has proved to be really
useful to link parsed preprocessed data (function source code) to the
filesystem.
Feel free to correct any bug you detect or to propose new functionality.
--
*Guillaume Larcheveque*