Hello. It gives me great pleasure to announce my COBOL parser. This is a
fixed format COBOL parser. I expect that it could be expanded to work with
free format COBOL, but I have not need for that use.
Code is located at:
http://www.smalltalkhub.com/#!/~cbc/PetitCobol
To invoke the parser, evalutate:
CobolProg parseCobolCodingForm: <fileName>
This 'parser' contains 4 parsers plus a fair amount of additional logic to
prep the files for for the prarsers (and output from previous parsers for
later parsers). The rough outline of what happens:
1) File is read line by line. Each line is parsed as a formatted card.
2) Take these cards, and format them into sentences.
3) Parse the coding structure. (Parse it out into the various divisions,
and parse out the level 01 data).
4) Aggregate the structure into a segments.
5) Finally, parse the actual code, division by division.
The parser includes a full AST representation, along with a visitor to
subclass to help handling the resulting AST.
The parser is not complete - it should parse any fixed format COBOL program
file, but not all commands are implemented. I have implemented a way to
iteratively develop the parser. It will continue to parse each sentence
up to a point where it cannot continue - at that point, it will parse into
a CDJunk (for data division unknowns) and CobolStatement (for program
division unknowns). This later will point out any missing commands (which
exist), or possibly incomplete commands (which may exist); a simple visitor
over the AST trapping for those nodes should find them.
The result of the parse will leave you with a CobolProg containing the
final parsed AST in the variable formattedStructure. Comments in the code
will be in the variable comments (along with the line number that they
originated from). In addition, most of the interim steps will also be
present in the CobolProg instance, should you be interested in them. If
not, you can send #cleanup to the instance to get rid of all but the final
AST nodes.
Thanks,
cbc