Look Before You Leap!
Duplicate Code Increases Risk During Legacy Modernization and Inhibits the Journey towards Digital Transformation
Introduction
For many years, industry analysts have stated that over 50% of legacy modernization projects fail. Even those deemed “successful” often disappoint their consumers when deployed. Needless to say, the high failure rate has resulted in massive resource and financial waste. In many organizations, it has also created a fear of failure that prevents positive change. This lack of change equates to lost revenue opportunities, lost market share and slow adoption of disruptive technologies. In some sense, these failures are the proverbial anchor being hauled along the path to Digital Transformation.
To reduce risk and increase the chances of success, a plethora of tools, services and techniques have been developed to assist in legacy modernization, whether it be lifting-and-shifting mainframe systems to lower cost platforms (is this really “modernization”?), COTS replacement, automated transformation, greenfield rewrites, etc. More recently, newer architectural principles have surfaced to aid in this area as well.
Despite all of the considerable (and generally very helpful) progress made, it is surprising how few modernization projects take advantage of a simple and extremely effective way of reducing project scope and complexity – eliminating similar and duplicate source code.
Why should organizations identify and remediate duplicate code when modernizing? Consider the following points:
- Working with a smaller, better architected set of source code can greatly reduce the sheer amount of required work
- Duplicate code may be greatly reduced and even eliminated from the new system – making the new systems easier to understand and maintain
- Many tools and services charge by the line of code worked on. Why pay for the same code to be worked on multiple times?
- Hardware and licensing costs for target systems like lift-and-shift and COTS may be reduced by having a significantly smaller system
The remainder of this paper will discuss the issue, approaches and potential benefits of identifying and reducing similar and duplicate code and architecting a solution to it in the target environment.
What do you mean we have Duplicate Code?!
Having actionable intelligence on duplicate source code allows for the creation of a roadmap for scope reduction not only through obvious “blocking and tackling” techniques but also through more sophisticated techniques involving architectural approaches. It is surprisingly difficult to get most IT organizations to recognize they have a problem in this area. The difficulty is compounded by the need to both acknowledge the general problem and to figure out ways to discover and aggregate meaningful information about it.
One extreme example encountered involved a taxing authority in the south eastern United States. For years, the authority’s IT staff had repurposed a base set of about 65 programs of their tax system every year. In each of these programs, the IT staff would copy large blocks of code and modify the code to reflect rates and other statutory changes for the current tax year. Then they would modify a rather large IF-THEN-ELSE block to branch to the newly inserted code. The code had accordingly become quite bloated over the years. What’s worse, the code could not easily be removed without rearchitecting the programs– introducing significant new complexity, fragility and risk.
The correct architectural approach involves not simply reducing code on a program-by-program basis, but also looking for opportunities to extract duplicate processes and consolidate them into a more centralized and modular basis. We will discuss this in greater detail below.
How did we get all this redundancy in our Legacy Systems?!
Even managers who acknowledge the problem have trouble controlling the issue. In our experience, when discussing bad things that creep into these legacy systems, most managers chuckle and sarcastically quip “our programmers would never do that!” It is well-accepted that nobody has perfectly written legacy applications. Almost all legacy systems contain duplicate code. The factors that have led to the accumulation of such technical debt include:
- The overall life of these systems. It is not uncommon for legacy systems to be 20+ and even 30+ years’ old – meaning that the system today includes 20 or 30 years of changes and additions.
- Developers have come and gone over the years. New developers assume responsibilities without a prior grounding in the system and find it is easier (and initially less risky) to modify copied code than take the time required to truly understand the code’s internal workings to make more elegant modifications.
- IT practices sometimes require the creation of “shadow systems” which can be near clones of existing systems and/or their sub components.
- Mergers and Acquisitions often bring in disjointed legacy systems that perform similar functions. These systems must either be merged with current systems or their parallel functionality must be retained. In some cases, code may be copied between systems to quickly bring them into compliance.
- Developers are plagiaristic at heart – why write something from scratch when a skeleton or near matching set of functionalities exist somewhere it can be copied from?
How do I Identify Duplicate and Partially Duplicate Code?!
Because of the size and complexity of most legacy systems, a tools-based approach is required to properly identify duplicate and partially duplicate code – both within and across programs.
A thorough analysis may surface thousands of opportunities to reduce code and/or create a better architected target system. And while it is easier to find exact duplicate code, it is critical to also be able to find code that is a partial match. Consider the example below. We have discovered a block of code that is repeated multiple times with a high degree of similarity.
Notice that only a single difference exists between the two blocks of code contained in 2 different programs.
Leveraging File and Block Comparison Results to Reduce Code and Complexity
To take full advantage of the opportunities to reduce code and complexity while more effectively architecting a new target system, consider the two code blocks immediately above that exist in two different programs.
Reducing Duplicate Code in the Legacy System Itself
From a legacy COBOL perspective, a simple approach would be to move this code out to a copy file and use COPY … REPLACING syntax for each instance this code appears.
This code gets modified slightly and moved out of both programs into a new copy file named “COPYRP01” (for example) as follows:
8000-PCL-HEADINGS.
MOVE FUNCTION CURRENT-DATE TO WS-DATE-TIME
MOVE SPACES TO PCL-REPORT-REC
MOVE ‘PROGRAM CONTROL LOG ‘ TO PCL-REPORT-REC(29:20)
WRITE PCL-OUT FROM PCL-REPORT-REC
DISPLAY PCL-REPORT-REC 00
MOVE SPACES TO PCL-REPORT-REC
WRITE PCL-OUT FROM PCL-REPORT-REC
DISPLAY PCL-REPORT-REC
MOVE SPACES TO PCL-REPORT-REC
* MOVE ‘PROGRAM-ID : DEMO03A ‘ TO PCL-REPORT-REC(01:20)
* Replaced with:
MOVE :ZZZZ: TO PCL-REPORT-REC(01:20)
WRITE PCL-OUT FROM PCL-REPORT-REC
DISPLAY PCL-REPORT-REC
MOVE SPACES TO PCL-REPORT-REC
MOVE ‘RUN DATE : ‘ TO PCL-REPORT-REC(01:20)
MOVE WS-CURR-DATE-MM TO PCL-REPORT-REC(14:02)
MOVE ‘/’ TO PCL-REPORT-REC(16:01)
MOVE WS-CURR-DATE-DD TO PCL-REPORT-REC(17:02)
MOVE ‘/’ TO PCL-REPORT-REC(19:01)
MOVE WS-CURR-DATE-YY TO PCL-REPORT-REC(20:04)
It is then referenced in the programs requiring such as:
COPY COPYRP01 REPLACING ==:ZZZZ:== BY ==’PROGRAM-ID : DEMO03A ‘==.
… in the program DEMO03A (the program on the left side of the compare shown above). And then:
COPY COPYRP01 REPLACING ==:ZZZZ:== BY == ‘PROGRAM-ID : DN002D ‘==.
… in program DEMO02A (the program on the right side of the compare shown above).
This technique moves the duplicate code out of the programs and creates a single copy with the specific different area externalized for easy modification. For 100% matching code, no COPY REPLACING syntax is required and instead a simple COPY statement can be used.
If there are multiple changes required, the COBOL COPY REPLACING syntax allows for such.
Reducing Duplicate Code While Transforming to a Modern Language – an Architectural Approach
During almost every type of IT modernization project (including lift-and-shift, automated transformations, rewrites and COTS implementations), identifying duplicate code and creating a mitigation strategy for it greatly increases the chance of successfully completing the project on time and on budget.
Using an architectural approach, externalizing duplicate code and making it unique in one or more common support classes can greatly reduce the scope of the transformation processes. The diagram below illustrates such:
The duplicate code has been extracted into single methods in a common support class. For code containing partial duplication, the duplication is parameterized to allow sharing of the code externally. The programs have been transformed into classes containing unique code. This makes for a much more efficient and maintainable modern system design as well. Business processes are now more encapsulated and distinguishable, offering a better path towards digital transformation.
Summary
Implementing a strategy to recognize and mitigate duplicate code can vastly reduce the size of the current legacy code base and make maintenance more efficient and less costly while significantly reducing risk.
During legacy application modernization, mitigating duplicate code can greatly reduce risk by reducing the scope of the effort as well as significantly reduce the cost of the effort required and the breadth of the target platform itself. The reduction in TCO will continue to pay large dividends well into the future.
With business processes now being better encapsulated and identifiable, moving along the journey towards digital transformation becomes more realistic and achievable.