Thursday, May 5, 2022

C but for Intel X86

Let's step away from current events in technology, and indulge in a small reverie. Let's play a game of "what if" with past technology. Specifically the Intel 8086 processor.

I will start with the C programming language. C was designed for the DEC PDP-7 and PDP-11 processors. Those processors had some interesting features, and the C language reflects that. One example is the increment and decrement operators (the '++' and '--' operators) which map closely to addressing modes on the DEC processors.

Suppose someone had developed a programming language for the Intel 8086, just as Kernighan and Ritchie developed C for the PDP-11. What would it look like?

Let's also suppose that this programming language was developed just as the 8086 was designed, or shortly thereafter. It was released in 1978. We're looking at the 8086 (or the 8088, which has the identical instruction set) and not the later processors.

The computing world in 1978 was quite different from today. Personal computers were just entering the market. Apple and Commodore had computers that used the 6502 processor; Radio Shack had computers that used the Z-80 and the 68000.

Today's popular programming languages didn't exist. There was no Python, no Ruby, no Perl, no C#, and no Java. The common languages were COBOL, FORTRAN, BASIC, and assembler. Pascal was available, for some systems. Those would be the reference point for a new programming language. (C did exist, but only in the limited Unix community.)

Just as C leveraged features of the PDP-7 and PDP-11 processors, our hypothetical language would leverage features of the 8086. What are those features?

One feature that jumps out is text strings. The 8086 has instructions to handle null-terminated text. It seems reasonable that an 8086-centric language would support them. They might even be a built-in type for the language. (Strings do require memory management, but that is a feature of the run-time library, not the programming language itself.)

The 8086 supports BCD (binary-converted decimal) arithmetic. BCD math is rare today, but it was common on IBM mainframes and a common way to encode data for exchange with other computers.

The 8086 had a segmented architecture, with four different segments (code, data, stack, and "extra"). Those four segments map well to Pascal's organization of code, static data, stack, and heap. (C and C-derivatives use the same organization.) A language could support dynamic allocation of memory and recursive functions (two things that were not available in COBOL, FORTRAN, or BASIC). And it could also support a "flat" organization like those used in COBOL and FORTRAN, in which all variables and functions are laid out at link time and fixed in position.

There would be no increment or decrement operators.

Who would build such a language? I suppose a natural choice would be Intel, as they knew the processor best. But then, maybe not, as they were busy with hardware design, and had no operating system on which to run a compiler or interpreter.

The two big software houses for small systems (at the time) were Microsoft and Digital Research. Both had experience with programming languages. Microsoft provided BASIC for many different computer systems, and also had FORTRAN and COBOL. Digital Research provided CBASIC (a compiled BASIC) and PL/M (a derivative of IBM's PL/I).

IBM would probably not create a language for the 8086. They had no offering that used that processor. The IBM PC would arrive in 1981, and IBM didn't consider it a serious computer -- at least not until people started buying them in large quantities.

DEC, at the time successful with minicomputers, also had no offering that used the 8086. DEC offered many languages, but used their own processors.

Our language may have been developed by a hardware vendor, such as Apple or Radio Shack, but they like Intel were busy with hardware and did very little in terms of software.

So it may have been either Microsoft or Digital Research. Both companies were oriented for business, so a language developed by either of them would be oriented for business. A new language for business might be modelled on COBOL, but COBOL didn't allow for variable-length strings. FORTRAN was oriented for numeric processing, and it didn't handle strings either. Even Pascal had difficulty with variable-length strings.

My guess is that our new language would mix elements of each of the popular languages. It would be close to Pascal, but with more flexibility for text strings. It would support BCD numeric values, not only in calculations but also in input-output operations. The language would be influenced by COBOL's verbose approach to a language.

We might see something like this:

PROGRAM EXAMPLE
DECLARE
  FILE DIRECT CUSTFILE = "CUST.DAT";
  RECORD CUSTREC
    BCD ACCTNUM PIC 9999,
    BCD BALANCE PIC 9999.99;
    STRING NAME PIC X(20);
    STRING ADDRESS PIC X(20);
  FILE SEQUENTIAL TRANSFILE = "TRANS.DAT";
  RECORD TRANSREC
    BCD ACCTNUM PIC 9999,
    BCD AMOUNT PIC 999.99;
  STRING NAME;
  BCD PAYMENT,TAXRATE;
PROCEDURE INIT
  OPEN CUSTFILE, TRANSFILE;
PROCEDURE MAINLINE
  WHILE NOT END(TRANSFILE) BEGIN
    READ TRANSFILE TO TRANSREC;
  END;
CALL INIT;
CALL MAINLINE;

and so forth. This borrows heavily from COBOL; it could equally borrow from Pascal.

It may have been a popular language. Less verbose than COBOL, but still able to process transactions efficiently. Structured programming from Pascal, but with better input-output. BCD data for efficient storage and data transfer to other systems.

It could have been a contender. In an alternate world, we could be using programming languages derived not from C but from this hybrid language. That might solve some problems (such as buffer overflows) but maybe given us others. (What problems, you ask? I don't know. I just now invented the language, I'll need some time to find the problems.)


1 comment:

Unknown said...

Hi John, great blog post. Shortly after this timeframe (1980-1984) I did a lot of work parsing and concatenating text strings while studying computational linguistics. I used Pascal, LISP, and PNLP in grad school. I had access to the Brown University Corpus of English language text, and I gave my students assignments to analyze text strings. I went to work for LOGOS, a machine translation company whose code was all in FORTRAN. One of our competitors ran their stuff on Wang machines.