Hey Thierry,
Thank you for your input!
On 2 April 2015 at 16:59, Thierry Goubier <thierry.goubier(a)gmail.com> wrote:
Hi Jan,
2015-04-02 16:36 GMT+02:00 Jan Kurš <kurs(a)iam.unibe.ch>ch>:
Hi All,
I have pushed a new version for indentation sensitivity into the
PetitParser. If I am not mistaken, with the new changes, you can parse
languages such as Python, F#, Haskell, YAML or Markdown. There are proof of
concept implementations in the PetitParser repository. YAML and Markdwon
are almost complete, for Python only the indentation-sensitive rules.
There is a small introduction to the indentation parsing with PetitParser
(
http://scg.unibe.ch/research/indentParsing), let me know, if you are
missing something or if you want to know something more. There are also
many examples of the indentation sensitive grammars in the repository.
This is interesting. Thanks for the description and the link.
One note in the end: isn't it that in fact what you are eating with the
indentation stack is more than whitespace (i.e. bullet points)? See last
paragraph, because your approach benefit is that you can use more than
whitespace as indent, anything arbitrary would do, if I have understood
correctly.
Oh, you are absolutely right, you can consume any parsing expression with
the indentation stack (even though I would prefer nothing more complicated
than regex there :).
I did not mentioned it in the link above. But for example in Markdown, one
use the indentation stack to recognize nesting of quoted blocks and lists,
that starts with '>' and with whitespaces. In markdown, each quoted block
or nested list pushes to the indentation stack and all the expressions on
the indentation stack are evaluated in a sequence at the beginning of each
line. So IMO you understand that correctly.
If you wonder about performance, the indentation
sensitive grammar is
approximately five times slower, because the position and an extra
indentation stack is remembered instead of the simple position. If you
don't use indentation, there is no slowdown. I am working on improvements,
any ideas are welcome :)
Totally out of the blue: your typical indentation parsing expression is a
regex type of thing. What about compiling it as a SmaCC scanner? Such a PP
parser would be equivalent to a Lexing automaton, and good compilers exist
for such automatons.
Good point, I do plan to do something like this for any parsing expression
to get some performance. And it is a good idea to use SmaCC infrastructure
for this.
Yet, the main performance problem with indentation expressions is to manage
the indentation stack. That stack needs to be memoized (before choices,
sequences, predicates) and restored if necessary, which takes most of the
time. I hope some copy-on-demand will lower the overhead.
Thierry
Jan
If you don't like new changes, you can still stable version
(ConfigurationOfPetitParser loadStable) or version 1.11.
Regards,
Jan
_______________________________________________
Moose-dev mailing list
Moose-dev(a)iam.unibe.ch
https://www.iam.unibe.ch/mailman/listinfo/moose-dev
_______________________________________________
Moose-dev mailing list
Moose-dev(a)iam.unibe.ch
https://www.iam.unibe.ch/mailman/listinfo/moose-dev