Wednesday, April 10, 2019

Program language and program size

Can programs be "too big"? Does it depend on the language?

In the 1990s, the two popular programming languages from Microsoft were Visual Basic and Visual C++. (Microsoft also offered Fortran and an assembler, and I think COBOL, but they were used rarely.)

I used both Visual Basic and Visual C++. With Visual Basic it was easy to create a Windows application, but the applications in Visual Basic were limited. You could not, for example, launch a modal dialog from within a modal dialog. Visual C++ was much more capable; you had the entire Windows API available to you. But the construction of Visual C++ applications took more time and effort. A simple Visual Basic application could be "up and running" in a minute. The simplest Visual C++ application took at least twenty minutes. Applications with dialogs took quite a bit of time in Visual C++.

Visual Basic was better for small applications. They could be written quickly, and changed quickly. Visual C++ was better for large applications. Larger applications required more design and coding (and more testing) but could handle more complex tasks. Also, the performance benefits of C++ were only obtained for large applications.

(I will note that Microsoft has improved the experience since those early days of Windows programming. The .NET framework has made a large difference. Microsoft has also improved the dialog editors and other tools in what is now called Visual Studio.)

That early Windows experience got me thinking: are some languages better at small programs, and other languages better at large programs? Small programs written in languages that require a lot of code (verbose languages) have a disadvantage because of the extra work. Visual C++ was a verbose language; Visual Basic was not -- or was less verbose. Other languages weigh in at different points on the scale of verbosity.

Consider a "word count" program. (That is, a program to count the words in a file.) Different languages require different amounts of code. At the small-program end of the scale we have languages such as AWK and Perl. At the large-end of the scale we have COBOL.

(I am considering lines of code here, and not executable size or the size of libraries. I don't count run-time environments or byte-code engines.)

I would much rather write (and maintain) the word-count program in AWK or Perl (or Ruby or Python). Not because these languages are modern, but because the program itself is small. (Trival, actually.) The program in COBOL is large; COBOL has some string-handling functions (but not many) and it requires a fair amount of overhead to define the program. A COBOL program is long, by design. The COBOL language is a verbose language.

Thus, there is an incentive to build small programs in certain languages. (I should probably say that there is an incentive to build certain programs in certain languages.)

But that is on the small end of the scale of programs. What about the other end? Is there an incentive to build large programs in certain languages?

I believe that the answer is yes. Just as some languages are good for small programs, other languages are good for large programs. The languages that are good for large programs have structures and constructs which help us humans manage and understand the code in large scale.

Over the years, we have developed several techniques we use to manage source code. They include:

  • Multiple source files (#include files, copybooks, separate compiled files in a project, etc.)
  • A library of subroutines and functions (the "standard library")
  • A repository of libraries (CPAN, CRAN, gems, etc.)
  • The ability to define subroutines
  • The ability to define functions
  • Object-oriented programming (the ability to define types)
  • The ability to define interfaces
  • Mix-in fragments of classes
  • Lambdas and closures

These techniques help us by partitioning the code. We can "lump" and "split" the code into different subroutines, functions, modules, classes, and contexts. We can define rules to limit the information that is allowed to flow between the multiple "lumps" of a system. Limiting the flow of information simplifies the task of programming (or debugging, or documenting) a system.

Is there a point when a program is simply "too big" for a language?

I think there are two concepts lurking in that question. The first is a relative answer, and the second is an absolute answer.

Let's start with a hypothetical example. A mind experiment, if you will.

Let's imagine a program. It can be any program, but it is small and simple. (Perhaps it is "Hello, world!") Let's pick a language for our program. As the program is small, let's pick a language that is good for small programs. (It could be Visual Basic or AWK.)

Let's continue our experiment by increasing the size of our program. As this was a hypothetical program, we can easily expand it. (We don't have to write the actual code -- we simply expand the code in our mind.)

Now, keeping our program in mind, and remembering our initial choice of a programming language, let us consider other languages. Is there a point when we would like to switch from our chosen programming language to another language?

The relative answer applies to a language when compared to a different language. In my earlier example, I compared Visual Basic with Visual C++. Visual Basic was better for small programs, Visual C++ for large programs.

The exact point of change is not clear. It wasn't clear in the early days of Windows programming, either. But there must be a crossover point, where the situation changes from "better in Visual Basic" to "better in Visual C++".

The two languages don't have to be Visual Basic and Visual C++. They could be any pair. One could compare COBOL and assembler, or Java and Perl, or Go and Ruby. Each pair has its own crossover point, but the crossover point is there. Each pair of languages has a point in which it is better to select the more verbose language, because of its capabilities at managing large code.

That's the relative case, which considers two languages and picks the better of the two. Then there is the absolute case, which considers only one language.

For the absolute case, the question is not "Which is the better language for a given program?", but "Should we write a program in a given language?". That is, there may be some programs which are too large, too complex, too difficult to write in a specific programming language.

Well-informed readers will be aware that a program written in a language that is "Turing complete" can be translated into any other programming language that is also "Turing complete". That is not the point. The question is not "Can this program be written in a given language?" but "Should this program be written in a given language?".

That is a much subtler question, and much more subjective. I may consider a program "too big" for language X while another might consider it within bounds. I don't have metrics for such a decision -- and even if I did, one could argue that my cutoff point (a complexity value of 2000, say) is arbitrary and the better value is somewhat higher (perhaps 2750). One might argue that a more talented team can handle programs that are larger and more complex.

Someday we may have agreed-upon metrics, and someday we may have agreed-upon cutoff values. Someday we may be able to run our program through a tool for analysis, one that computes the complexity and compares the result to our cut-off values. Such a tool would be an impartial judge for the suitability of the programming language for our task. (Assuming that we write programs that are efficient and correct in the given programming language.)

Someday we may have all of that, and the discipline to discard (or re-design) programs that exceed the boundaries.

But we don't have that today.

No comments: