Monday, October 14, 2013

Executables, source code, and automated tests

People who use computers tend to think of the programs as the "real" software.

Programmers tend to have a different view. They think of the source code as the "real" software. After all, they can always create a new executable from the source code. The generative property of source code gives it priority of the mere performant property of executable code.

But that logic leads to an interesting conclusion. If source code is superior to executable code because the former can generate the latter, then how do we consider tests, especially automated tests?

Automated tests can be used to "generate" source code. One does not use tests to generate source code in the same, automated manner that a compiler converts source code to an executable, but the process is similar. Given a set of tests, a framework in which to run the tests, and the ability to write source code (and compile it for testing), one can create the source code that produces a program that conforms to the tests.

That was a bit of a circuitous route. Here's the concept in a diagram:


     automated tests --> source code --> executable code


This idea has been used in a number of development techniques. There is test-driven development (TDD), extreme programming (XP), and agile methods. All use the concept of "test first, then code" in which tests (automated tests) are defined first and only then is code changed to conform to the tests.

The advantage of "test first" is that you have tests for all of your code. You are not allowed to write code "because we may need it someday". You either have a test (in which case you write code) or you don't (in which case you don't write code).

A project that follows the "test first" method has tests for all features. If the source code is lost, one can re-create it from the tests. Granted, it might take some time -- this is not a simple re-compile operation. A complex system will have thousands of tests, perhaps hundreds of thousands. Writing code to conform to all of those tests is a manual operation.

But it is possible.

A harder task is going in the other direction, that is, writing tests from the source code. It is too easy to omit cases, to skip functionality, to misunderstand the code. Given the choice, I would prefer to start with tests and write code.

Therefore, I argue that the tests are the true "source" of the system, and the entity we consider "source code" is a derived entity. If I were facing a catastrophe and had to pick one (and only one) of the tests, the source code, and the executable code, I would pick the tests -- provided that they were automated and complete.

No comments: