Fitzpatrick's Fabulous Future: legacy code

Showing posts with label legacy code. Show all posts

Sunday, October 18, 2015

Languages become legacy languages because of their applications

Programming languages have brief lifespans and a limited set of destinies.

COBOL, invented in 1959, was considered passé in the microcomputer age (1977 to 1981, prior to the IBM PC).

Fortran, from the same era as COBOL, saw grudging acceptance and escaped the revulsion given COBOL, possibly because COBOL was specific to accounting and business applications and Fortran was useful for scientific and mathematical applications.

BASIC, created in the mid-1960s, was popular through the microcomputer and PC-DOS ages but did not transition to Microsoft Windows. Its eponymous successor, Visual Basic, was a very different language and it was adored in the Windows era but reviled in the .NET era.

Programming languages have one of exactly two fates: despised as "legacy" or forgotten. Yet it is not the language (its style, syntax, or constructs) that define it as a legacy language -- it is the applications written in the language.

If a language doesn't become popular, it is forgotten. The languages A-0, A-1, B-0, Autocode, Flow-matic, and BACAIC are among dozens of languages that have been abandoned.

If a language does become popular, then we develop applications -- useful, necessary application -- in it. Those useful, necessary applications eventually become "legacy" applications. Once enough of the applications written in a language are legacy applications, the language becomes a "legacy" language. COBOL suffered this fate. We developed business systems in it and those systems are too important to abandon, yet also too complex to convert to another language, so COBOL lives on. But we don't build new systems in COBOL, and we later programmers don't like COBOL.

The true mark of legacy languages is the open disparagement of them. When a sufficient number of developers refuse to work with them (the languages), then they are legacy languages.

Java and C# are approaching the boundary of "legacy". They have been around long enough for enough people to have written enough useful, necessary applications. These applications are now becoming legacy applications: large, difficult to understand, and written in the older versions of the language. It is these applications that will doom C# and Java to legacy status.

I think we will soon see developers declining to learn Java and C#, focussing instead on Python, Ruby, Swift, Haskell, or Rust.

Monday, July 28, 2014

Improving code can cause an explosion of classes

Object-oriented programming took the world by storm in the 1990s. Those early days saw a lot of programmers learning new skills.

It took some time to truly learn object-oriented programming. The jump from structured programming (or procedural code) to object-oriented programming was not small. (And is still not small.)

Many early attempts at object-oriented programming were inelegant if not amateurish. They contained mistakes, but the errors are only visible in hindsight. Programmers who are inexperienced in a new technique make mistakes. (I'm one of them.)

Common problems were:

large classes with many purposes
long functions (procedural code wrapped in object clothing)
excessive inheritance
too little inheritance
weak encapsulation (little or no use of access control)
little or no composition

Legacy systems often contain these problems. More than three decades after their inception, systems contain original design flaws. The problems remain because they are difficult to correct and the return on the investment is unclear. I often argue that a better design reduces maintenance costs in the future, and the counter-argument is that the current development team knows the code and would gain little from an improved design.

When I can convince the system owners of the benefits of improved code (and I am becoming more convincing over time), we see a remarkable transformation in the code.

The most obvious change is in the number of classes. The revised system contains many more classes, often several times the original number. Yet while the number of classes increases, the total number of lines of code decreases. The construction of new classes allows for the consolidation of duplicate code, something that occurs often in legacy systems.

The new classes are usually small. Instead of the large, multipurpose classes of the earlier design, I move functions to small, single-purpose classes. Some classes are mere data containers, others hold one or two elements and provide a small number of functions on those elements. While small, these classes have a big effect on the readability of the code: they eliminate low-level operations from high-level and mid-level code, allowing the reader to focus on the higher level operations.

Small classes are much easier to test, and much easier to test with automated tools. Even C++ can use automated tests to verify the operation of classes. Automated tests relieve a burden from developers (and testers or "QA" folk) and allow them to direct their efforts to building and maintaining meaningful tests.

A large number of small classes provides an additional benefit: the ability to group classes into libraries. Large (or large-ish) early object-oriented systems tend to group all of the classes into a single package, usually called "the application". With a large number of classes, the system maintainers see groups of classes emerge (perhaps all of the database classes, or all of the elementary data classes). These groupings can be formalized with libraries. For very large projects, these libraries can be maintained by different teams. Libraries can also be shared across multiple projects, reducing the duplication of effort at a larger scale.

Modernizing legacy systems can lead to an "explosion of classes", and this can be a good thing. Smaller classes are easier to understand and maintain. They can be tested independently. They can be grouped into libraries. Do not fear such an increase in the number of classes in your code.

Wednesday, February 26, 2014

Legacy code isn't code -- its a lack of tests

We have all heard of legacy code. Some of use have had the (mis)fortune to work on it. But where does it come from? How is it created?

If legacy code is nothing more than "really old code", then when does normal code become legacy code? How old does code have to be to earn the designation "legacy"?

I've worked on a number of projects. The projects used different programming languages (C, Visual Basic, C++, Java, C#, Perl). They had different team sizes. They were at different companies with different management styles. Some projects had old code but it wasn't legacy code. Some projects created new code that was legacy code from the first day.

Legacy code is not merely old code. Legacy code is code that few (if any) team members want to modify or enhance. It is code that has a reputation, code that contains risk. It is hard to maintain and easy to break.

Legacy code is code without tests.

With a set of tests -- a comprehensive set of tests -- one can change code and be sure that it still works. That is a powerful position. With tests to verify the operation of the code, programmers can refactor code and simplify it, knowing that mistakes will be caught.

Without tests, programmers limit their changes to the bare minimum. Changes are small, surgical operations that adjust the smallest number of lines of code. The objective is not to improve the code, not to make it readable or more reliable, but to avoid breaking something.

Changes to code with tests also strive to avoid breaking things, but the programmer doesn't need the paranoia-like fear of changes. The tests verify the code, and frequent testing identifies errors quickly. The programmer can focus on improvements to the code.

Don't ask the question "is our code legacy code" -- ask the question "do we have comprehensive tests".

Fitzpatrick's Fabulous Future