Hello. It gives me great pleasure to announce my
COBOL parser. This is a fixed format COBOL parser. I expect
that it could be expanded to work with free format COBOL, but I
have not need for that use.
Code is located at:
To invoke the parser, evalutate:
CobolProg parseCobolCodingForm: <fileName>
This 'parser' contains 4 parsers plus a fair amount of
additional logic to prep the files for for the prarsers (and
output from previous parsers for later parsers). The rough
outline of what happens:
1) File is read line by line. Each line is parsed as a
formatted card.
2) Take these cards, and format them into sentences.
3) Parse the coding structure. (Parse it out into the
various divisions, and parse out the level 01 data).
4) Aggregate the structure into a segments.
5) Finally, parse the actual code, division by division.
The parser includes a full AST representation, along with a
visitor to subclass to help handling the resulting AST.
The parser is not complete - it should parse any fixed
format COBOL program file, but not all commands are
implemented. I have implemented a way to iteratively develop
the parser. It will continue to parse each sentence up to a
point where it cannot continue - at that point, it will parse
into a CDJunk (for data division unknowns) and CobolStatement
(for program division unknowns). This later will point out
any missing commands (which exist), or possibly incomplete
commands (which may exist); a simple visitor over the AST
trapping for those nodes should find them.
The result of the parse will leave you with a CobolProg
containing the final parsed AST in the variable
formattedStructure. Comments in the code will be in the
variable comments (along with the line number that they
originated from). In addition, most of the interim steps will
also be present in the CobolProg instance, should you be
interested in them. If not, you can send #cleanup to the
instance to get rid of all but the final AST nodes.
Thanks,
cbc