Apart from complaining about Clearcase I have actually done some real work the past few years.
Chapter 1: How we converted our entire COBOL code-base to Java (sort-of).
Already you might start to feel a sick feeling in the back of your throat. That’s normal and should pass shortly after you finish reading.
COBOL? WTF?!? This is usually the first thing people say to me when I tell them about what I’ve been working on. I’m sad to say that COBOL is still very much in use.
The dilemma that our company faced was that our main product has been around for many years and over time we have built up a unbelievably large code-base of COBOL, consisting of millions of lines. Re-writing this is both costly and time-consuming, not to mention risky. We needed another option.
Another serious problem is that the upfront cost of CICS, the COBOL ‘application server’ if you will, running on dedicated Mainframe hardware, plus the cost of Micro Focus licenses, for compiling COBOL, is bloody expensive. If we could run on a 100% Java stack, using open source technologies, we could save ourselves, and our customers, cold, hard cash.
At this point I need to mention something ‘special’ about how we use COBOL. To support a wide-range of transaction systems and databases we developed a custom variation of the language, which included custom-built ‘macros’ which generate unique code depending on the environment. While not especially relevant to this article, this leads to larger-than-expected COBOL (which is large enough as it is). The size of the program is significant for a few reasons, which I’ll discuss below.
Initially we started with LegacyJ, a commercial product that advertised productive COBOL to Java conversion. What was nice about using LegacyJ was that we quickly discovered that it was, in fact, possible to convert our code successfully and have a running system. However, we ran into a few serious problems that made us hesitate.
Firstly, the Java generated by LegacyJ was quite lengthy and often didn’t compile due to the length of some methods and number of fields. Apparently Java has a limit, not that you would ever conceivably reach it. To work around this I had to re-parse the Java to break these methods into smaller chunks and introduce a hierarchy of classes to work-around the field limit. Yuck.
Secondly, the classes generated by LegacyJ didn’t separate the idea of ‘meta’ information such as variable types from runtime data. For each instance of a program, variables had effectively duplicate copies of type information, resulting in an extra-large memory footprint.
The other issue, and perhaps the most compelling, was that of money; LegacyJ was not cheap. We were trading one expensive platform, CICS, with another.
At the same time the following article appeared, introducing an open-source COBOL to Java converter called NACA. I tried it almost immediately but quickly found that many of our COBOL programs didn’t compile due to some commands that NACA hadn’t implemented. At first I gave up and went back to our LegacyJ integration. It was only later, after taking a second look, that I realised there was much more potential on NACA’s generated Java and general approach.
The most obvious was that the Java was actually readable! At least if you count this as readable. NACA actually checked-in their Java files after the conversion, so the code had to be both readable and maintainable. This also had the nice side-effect of allowing our absolutely massive generated COBOL programs to compile (in 99% of cases anyway).
In addition there was a separate and static class structure representing the program definition, meaning that each program required less memory.
I was given some time to investigate the possibility of making NACA work with our unique flavour of COBOL. Fortunately it turned out there wasn’t too much missing_ and I managed to get a working prototype in a reasonably short period of time. After that the decision to switch to a cheaper and open-source alternative which we could control wasn’t hard to make and we haven’t looked back since.
To avoid making this post longer that it already is I might save the important discussion of performance for another day. In short our pure-Java application runs surprisingly quickly. The biggest bottleneck is, without a doubt, one of memory. Running an entire-COBOL runtime within the JVM is obviously costly in terms of memory, not helped by our generated COBOL and vast code-base.
Do I recommend this approach to others? Absolutely, without a doubt. There seems to be people advising against a direct port, or at least re-thinking_ the problem first. For us the issue is one of scale. There simply isn’t enough time/money to re-write everything, at least not in this decade. We needed to do something now; something we could guarantee would continue to work.
The benefits of running a pure-Java stack are, alone, compelling. One example that springs to mind is that of tracing. Once upon a time we would need to ask customers with a bug to recompile specific applications in trace mode in the vain hope that we actually knew where the problem was. Now we can leverage powerful Java logging (no, not that useless java.util.logging) and have full tracing across the entire stack; something that is invaluable for customer support.
So, while I hate the idea of granting further life to our hideous COBOL demon, from a business point-of-view it has been crucial in the continued success and evolution of our product; giving us breathing room to slowly migrate COBOL logic to ‘normal’ Java applications while guaranteeing our business logic continues to serve our customers. Or at least that’s what our marketing brochures say; for me it was fun.