Tuesday, October 9, 2018

C without the preprocessor

The C and C++ languages lack one utility that is found in many other languages: a package manager. Will they ever have one?

The biggest challenge to a package manager for C or C++ is not the package manager. We know how to build them, how to manage them, and how to maintain a community that uses them. Perl, Python, and Ruby have package managers. Java has one (sort of). C# has one. JavaScript has several! Why not C and C++?

The issue isn't in the C and C++ languages. Instead the issue is in the preprocessor, an external utility that modifies C or C++ code before the compiler does its work.

The problem with the preprocessor is that it can change just about any token in the code to something else, including statements which would be used by package managers. The preprocessor can change "do_this" to "do_that" or change "true" to "TRUE" or change "BEGIN" to "{".

The idea of a package manager for C and C++ has been discussed, and someone (I forget the person now) listed a number of questions that the preprocessor raises for a package manager. I won't repeat the list here, but they were very good questions.

To me, it seems that a package manager and a preprocessor are incompatible. If you have one, you cannot have the other. (At least, not with any degree of consistency.)

So I started thinking... what if we eliminate the C/C++ preprocessor? How would that change the languages?

Let's look at what the preprocessor does for us.

For starters, it is the mechanism to include headers in programs. The "#include" lines are handled by the preprocessor, not the compiler. (When C was first designed, a preprocessor was considered a "win", as it separated some tasks from the compiler and followed the Unix philosophy of separation of duties.) We still need a way to include definitions of constants, functions, structures, and classes, so we need a replacement for the #include command.

A side note: C and C++ standard wonks will know that it is not required that the preprocessor and not the compiler handle "#include" lines. The standards dictate that after certain lines (such as #include "string") the compiler must exhibit certain behaviors. But this bit of arcane knowledge is not important to the general idea of elminating the preprocessor.

The preprocessor allows for conditional compilation. It allows for "#if/#else/#endif" blocks that can be conditionally compiled, based on what follows the "#if". Conditional compilation is extremely useful on software that has multiple targets, such as the Linux kernel (which targets many different processors).

The preprocessor also allows for macros and substitution of values. It accepts a "#define" line which can change any token into something else. This mechanism was used for the "max()" and "min()" functions.

All of that would be lost with the elimination of the preprocessor. As all of those features are used on many projects, they would all have to be replaced by some form of extension to the compiler. The compiler would have to read the included files, and would have to compile (or not compile) conditionally-marked code.

Such a change is possible, but not easy. It would probably break a lot of existing code -- perhaps all nontrivial C and C++ programs.

Which means that removing the preprocessor from C and C++ and replacing it with something else is a change to the language that makes C and C++ no longer C and C++. Removing the preprocessor changes the languages. They are no longer C and C++, but different languages, and deserving of different names.

So in once sense you can remove the preprocessor from C and C++, but in another sense you cannot.

No comments: