August 06, 2009

Koopa Cobol Parser

I just uploaded the very first release of the Koopa Cobol Parser project on sourceforge. You can find the website for the project here.

If you're wondering "why?", then this project probably isn't for you. A short answer is that there exist no free, extensible, adaptable Cobol parsers which are able to handle real legacy Cobol code. Koopa tries to fill that gap.

While it is still very much an alpha release, there are some redeeming features:

  • It makes extensive use of unit testing at the level of individual grammar rules. This helps with quick, detailed detection of problems.
  • It includes an ANSI 85 testsuite, which it is able to process quite well. There are some failures, but these are quite reasonable.
  • It has been run on over 1.5GB of industrial Cobol code, and again performs quite well.

So if you're looking for a no-strings-attached flexible Cobol parser, give Koopa a try!

PS. If you're wondering about the name, the thought process went something like: Cobol Parsing -> Co Pa -> Koopa...

11 comments:

Unknown said...

Oh no, it's written in Java :-)

KrisDS said...

Java is the new Cobol, so that's quite fitting. :-)

As an aside: the grammar itself is in a custom DSL, which is Java independent. It will take some work, but you can write a new parser generator for that DSL in your favourite language and escape from Java.

Unknown said...

Ah, a DSL. I already wondered about the .g files.

KrisDS said...

Actually the .g files are standard ANTLR grammars. :-)

The main DSL is in the .kg files (for Koopa Grammar). There are also .stage files which represent grammar tests, and .scoring files which are used in verification of the results of the parsing.

GKINF said...

HI! Very good job!

But i have some problems to use.
I don't know java (!).
Sorry bit i'm newbie (for this job)!

I want, for example, obtain only two files output (xml or text).

One with the working storage variables (hierarchy, attribute, etc.).
One with tree of the procedures.

How is possible to change on your code? What module?

And if i want to implement a new Reserved Word (ie "SKIP") it's necessary to change "CobolGrammar.java" and what else?

Bye and thank's a lot

KrisDS said...

GKINF, Hi!

Well, you're going to have to use Java. No way around that.

For the specialised output files: a custom tree parser should do the trick. Look for "MyAdaptiveTreeParser.g" in the sandbox folder for an example of a specialised tree parser. Replace "System.out.println" with writes to a file (a FileWriter will do), and adapt the grammar so that you capture the right data (you might want to check out some ANTLR documentation as well).

As for extending the grammar: don't update the "CobolGrammar.java"! This file is generated from "Cobol.kg". What you need to do is update that one and regenerate the Java code (you can do that with the ANT build.xml file; target "regenerate").

But, again, all of this will require quite a good amount of Java knowledge. In addition you'll need to know about ANT, ANTLR, and Koopa itself.

GKINF said...

Thank's for your fast answer!

I understood about that!

Java for first!

But this isn't a normal Java!
I know well the parser process but i dind't identify the single Koopa process elements.
Now, with your instruction (grammar and Adaptive parser example), it is better.

I try to implement something and if i solve i will return the solution!
By for now
Hi from Gianni

KrisDS said...

"But this isn't a normal Java!"

I guess not. :-)

Good luck, and looking forward to seeing the results. If you need more help, you can always mail me on my sourceforge account.

modernator said...

Koopa approach to parsing using DSL is indeed a great idea to deal with complexities of COBOL.

I am trying this out and found that it does not currently support the pre-processor directives that are popular on mainframe like,
*CBL (for supplying compiler directives)
*EJECT, SKIPn for print control
* -INC - a Librarian directive for copybook inclusion.

How to go about extending Koopa for supporting such additional features?

KrisDS said...

@MasterProgrammer: Without looking at the details, in order of preference I would either go with extending the grammar or adding it as a lexer stage (a tokenizer in Koopa parlance).

If you would like to see support for this I'm willing to help out. Just contact me on sourceforge and provide some details or references to what you're looking for.

modernator said...

@Kris, Thanks for the quick response. I will contact with the details via Sourceforge.