14.1.1 Starting small with parsing names
As a first task, we parse the identifier names. Looking closely at the MSE grammar, we can find two distinct definitions for identifier names:
ELEMENTNAME := letter ( letter | digit ) * ( "." letter ( letter | digit ) * ) SIMPLENAME := letter ( letter | digit ) *
Let’s start with SIMPLENAME
. The grammar definition says that a valid name must start with a letter, and afterwards it can be followed by either a letter or a digit repeated multiple times. The same specification can be represented graphically:
The translation to PetitParser looks like:
simpleName := #letter asParser , ( #letter asParser / #digit asParser ) star.
It is that easy. It reads almost like the abstract grammar. In essence, the grammar production is mapped on a parser object, and in this case we store it in the simpleName
variable. The parser object is obtained out of terminal parser objects such as #letter asParser
for parsing one letter character, or #digit asParser
for parsing one digit character. These terminal parsers are composed using operators like ,
(sequence), /
(choice) or star
(zero or many). The result is a composite parser whose structure is a graph of objects.
To test our parser we pass it an input string via the parse:
method:
simpleName parse: 'ValidName'. "--> #($V #($a $l $i $d $N $a $m $e))"
If we print the result we obtain a rather strange looking array that contains each individual character of the input string. We deal with the manipulation of the result at a later time. Currently, we declared ourselves satisfied with getting a valid result.
How do we recognize a valid response from an invalid one? Let’s give it a try by passing an invalid name:
simpleName parse: '1InvalidName'. "--> letter expected at 0"
If we inspect the result, we obtain an instance of PPFailure
, which denotes that the parser was not successful in parsing the input. If we only want to test the success of parsing, then we can also use the match:
method:
simpleName match: 'ValidName'. "--> true"
simpleName match: '1InvalidName'. "--> false"
Having simple name sorted out, we can tackle ELEMENTNAME
. It looks slightly more complex and it requires us to specify an optional part that can follow after a dot character. We approach it like we did before and by translating the abstract grammar notation into the PetitParser API.
elementName := #letter asParser , ( #letter asParser / #digit asParser ) star ,
(
$. asParser ,
#letter asParser , ( #letter asParser / #digit asParser ) star
) optional.
The optional part is specified by simply sending optional
to the corresponding parser. Furthermore, to parse the dot character we use $. asParser
.
At a closer look, we notice the duplication in the above definition. The first part that covers the input until the optional dot is repeated inside the optional part. We could factor it out. Actually, we already have it in simpleName
, and we can reuse it:
elementName := simpleName , ($. asParser , simpleName) optional.
We can test it:
elementName matches: 'Valid.Name'. "--> true"
elementName matches: '1InvalidName.' "--> false"