Showing posts with label structured programming. Show all posts
Showing posts with label structured programming. Show all posts

Wednesday, April 20, 2022

Advances in programming come from restraints

Advances in programming come from, to a large extent, advances in programming languages. And those advances in programming languages, unlikely as it seems, are mostly not expanded features but restrictions.

That advances come from restrictions seems counter-intuitive. How does fewer choices make us better programmers?

Let's look at some selected changes in programming languages, and how they enabled better programming.

The first set of restrictions was structured programming. Structured programming introduced the concepts of the IF/THEN/ELSE statement and the WHILE loop. More importantly, structured programming banished the GOTO statement (and its cousin, the IF/GOTO statement). This restriction was an important advancement for programming.

A GOTO statement allows for arbitrary flows of control within programs. Structured programming's IF/THEN/ELSE and WHILE statements (and WHILE's cousin, the FOR statement) force structure onto programs. Arbitrary flows of control were not possible.

The result was programs that were harder to write but easier to understand, easier to debug, and easier to modify. Structured programming -- the loss of GOTO -- was an advancement in programming.

A similar advance occurred with object-oriented programming. Like structured programming, object-oriented programming was a set of restrictions coupled with a set of new features. In object oriented programming, those restrictions were encapsulation (hiding data within a class) and the limiting of functions (requiring an instance of the class to execute). Data encapsulation protected data from arbitrary changes; one had to go through functions (in well-designed systems) to change the data. Instance functions were limited to executing on instances of the class, which meant that one had to *have* an instance of the class to call the function. Functions could not be called at arbitrary points in the code.

Both structured programming and object-oriented programming advanced the state of the art for programming. They did it by restricting the choices that programmers could make.

I'm going to guess that future advancements in programming will also come from restrictions in new programming languages. What could those restrictions be?

I have a few ideas.

One idea is immutable objects. This idea has been tested in the functional programming languages. Those languages often have immutable objects, objects which, once instantiated, cannot change their state.

In today's object-oriented programming languages, objects are often mutable. They can change their state, either through functions or direct access of member data.

Functional programming languages take a different view. Objects are immutable: once formed they cannot be changed. Immutable objects enforce discipline in programming: you must provide all of the ingredients when instantiating an object; you cannot partially initialize an object and add things later.

I would like to see a programming language that implements immutable objects. But not perfectly -- I want to allow for some objects that are not immutable. Why? Because the shift to "all objects are immutable" is too much, too fast. My preference is for a programming language to encourage immutable designs and require extra effort to design mutable objects.

A second idea is a limit to the complexity of expressions.

Today's programming languages allow for any amount of complexity in an expression. Expressions can be simple (such as A + 1) or complex (such as A + B/C - sqr(B + 7) / 2), or worse.

I want expressions to be short and simple. This means breaking a complex expression into multiple statements. The only language that I know that placed restrictions on expressions was early FORTRAN, and then only for the index to an array variable. (The required form was I*J+K, where I, J, and K were optional.)

Perhaps we could design a language that limited the number of operations in an expression. Simpler expressions are, well, simpler, and easier to understand and modify. Any expression that contained more than a specific number of operations would be an error, forcing the programmer to refactor the expression.

A third idea is limits on the size of functions and classes. Large functions and large classes are harder to understand than small functions and small classes. Most programming languages have a style-checker, and most style-checkers issue warnings for long functions or classes with lots of functions.

I want to strengthen those warnings and change them to errors. A function that is too long (I'm not sure how long is too long, but that's another topic) is an error -- and the compiler or interpreter rejects it. The same applies to a class: too many data members, or too many functions, and you get an error.

But like immutable objects, I will allow for some functions to be larger than the limit, and some classes to be more complex than the limit. I recognize that some classes and functions must break the rules. (But the mechanism to allow a function or class to break the rules must be a nuisance, more than a simple '@allowcomplex' attribute.)

Those are the restrictions that I think will help us advance the art of programming. Immutable objects, simple expressions, and small functions and classes.

Of these ideas, I think the immutable objects will be the first to enter mainstream programming. The concept has been implemented, some people have experience with it, and the experience has been positive. New languages that combine object-oriented programming with functional programming (much like Microsoft's F#, which is not so new) will allow more programmers to see the benefits of immutable objects.

I think programming will be better for it.

Thursday, August 13, 2020

Add Functional Programming to Object-oriented Programming, not the other way round

The programming world has multiple paradigms, or styles of programming. The reigning champion is Object-oriented Programming, embodied in C++, Java, and C# and oriented around, well, objects. The prior champion was Structured Programming (or Procedural Programming) represented by C and Pascal and oriented around control structures (and avoiding GOTOs). The challenger is Functional Programming oriented around functions, and in the languages Haskell and F#, among others. An older paradigm is Unstructured Programming (USP), which does use GOTOs and lacks the modern IF/THEN/ELSE and DO/WHILE constructs of Structured Programming.

SP and USP operate in the same space: small and medium-side programs. Since they operate in the same space, one often picks one of them, but does not use both. (One can use both, but it is not easy.)

Object-oriented programming (OOP) operates in "the large" and helps us organize large systems. Structured Programming (SP) and Functional Programming (FP) operate in "the small" and help us build short blocks of code that are reliable and readable. In this view, SP complements OOP and they work well together. SP and FP, on the other hand, compete in the same space. We can build large systems with OOP and SP, or with OOP and FP. (We can build small programs with SP alone, or with FP alone. OOP is not useful with small programs.)

Structured Programming is demonstrably better than Unstructured Programming. Functional programming is arguably better then the older Structured Programming. One would think that we would use the best paradigm, and that we would use Functional Programming. But we don't. It takes time to learn a programming paradigm, and most programmers know the Structured Programming paradigm. Structured Programming is what we have, buried inside many modern systems.

The mix of Object-oriented Programming and Structured Programming makes sense. To use a programming paradigm, or a mix of paradigms, one must have a language that supports the paradigms. C++, Java, and C# all support OOP and SP. Pascal supports only SP; Object Pascal and Delphi support both. Haskell supports FP.

The advocates of Functional Programming have noticed that their preferred languages have achieved modest acceptance, but none have become as popular as the most popular languages C++, Java, and C#, or even Python. That lack of acceptance makes some sense, as a Functional Programming language does not have the Object-oriented Programming concepts that help us organize large systems. (Conversations with practitioners have probably ended with project managers and OOP programmers saying something along the lines of "Functional Programming is nice, but we have a large code base in an Object-oriented Programming language, and we cannot give up that organization." 

I think these conversations have spurred the advocates of Functional Programming to merge OOP into FP languages, the hope that project managers and programmers will accept the new combination. The combination of object-oriented programming and functional programming is a nice trick, and it works.

But this approach is, in my view, the wrong approach.

When programmers and managers say that they want a programming language that supports OOP and FP, they don't really mean that they want an FP programming language that supports OOP.

What they want is a programming language that supports their current code base and allows them to add Functional Programming in small doses.

They want a compromise, much like C++ was a compromise for OOP. C++ offered classes to C programmers -- and was originally called "C with classes". It didn't mandate a pure OOP approach. Instead, it let programmers add OOP to an existing code base, and add OOP in a gradual manner.

The "FP plus OOP" approach fails because the existing code base is not OOP, but OOP with SP. A "FP plus OOP" language requires that all of the SP code blocks be re-written in FP. That can work for tiny programs and classroom exercises, but it doesn't work in the industry with large sets of code.

Yes, some companies have large, million-lines-of-code systems written in FP. But those systems were built in FP from the beginning.

Companies that have large, million-lines-of-code systems written in OOP (with procedural code too) don't want to stop everything and re-write that code. They want to add FP in small doses. They want the benefit of FP in a few key locations. Over time, they can expand FP to other parts of the system.

They don't want an FP language with OOP extensions. They want an "OOP and SP" language with FP extensions.

There is a language that may provide a path forward. Microsoft has its F# language, which is a Functional Programming language, but it also lives in .NET, which means one can combine F# modules with C# modules. If one has a system in C# and wants the advantages of Functional Programming, F# is worth investigating.

But beyond F#, I see no languages that offer a compromise between paradigms, and a way forward for managers and programmers. The current set of FP languages are too "pure" and don't allow the mixing of OOP, FP, and SP paradigms.

Perhaps we will see some new languages that bridge the worlds of Structured Programming, Object-oriented Programming, and Functional Programming.

Wednesday, July 31, 2019

Programming languages, structured or not, immediate or not

I had some spare time on my hands, and any of my friends will tell you that when I have spare time, I think about things. This time, I thought about programming languages.

That's not a surprise. I often think about programming languages. This time I thought about two aspects of programming languages that I call structuredness and immediacy. Immediacy is simply the rapidity in which a program can respond. The languages Perl, Python, and Ruby all have high immediacy, as one can start a REPL (for read-evaluate-print-loop) that takes input and provides the result right away. (In contrast, programs in the languages C#, Java, Go, and Rust must be compiled, so there is an extra step to get a response.

Structuredness, in a language, is how much organization was encouraged by the language. I say "encouraged" because many languages will allow unstructured code. Some languages do require careful thought and organization prior to coding. Functional programming languages require a great deal of thought. Object-oriented languages such as C++, C#, and Java provide some structure. Old-school BASIC did not provide structure at all, with only a GOTO and a simple IF statement to organize your code. (Visual Basic has much more structure than old-school BASIC, and it is closer to C# and Java, although it has a bit more immediacy than those languages.)

My thoughts on structuredness and immediacy led me to think about the combination of the two. Some languages are high in one aspect, and some languages mix the two aspects. Was there an overall pattern?

I built a simple grid with structure on one axis and immediacy on the other. Structure was on the vertical axis: languages with high structure were higher on the chart, languages with less structure were lower. Immediacy was on the horizontal axis, with languages with high immediacy to the right and languages that provided slower response were to the left.

Here's the grid:

                         structured
                              ^
     Go C++ Objective-C Swift |
          C# Java VB.NET      |
                              | Python Ruby
       (Pascal)               |    Matlab
                              |      Visual Basic
      C                       | SQL   Perl
      COBOL Fortran           |    JavaScript (Forth)
slow <------------------------------------------------> <----------------------------------------------->immediate
       (FORTRAN)              |          R
                              |            (BASIC)
                              |
                              |
                              |
                              |            spreadsheet
                              v
                        unstructured

Some notes on the grid:
- Languages in parentheses are older, less-used languages.
- Fortran appears twice: "Fortran" is the modern version and "(FORTRAN)" is the 1960s version
- I have included "spreadsheet" as a programming language

Compiled languages appear on the left (slow) side. This is not related to the performance of programs written in these languages, but the development experience. When programming in a compiled language, one must edit the code, stop and compile, and then run the program. Languages on the right-hand side (the "immediate" side) do not need the compile step and provide feedback faster.

Notice that, aside from the elder FORTRAN, there are no slow, unstructured languages. Also notice that the structured immediate languages (Python, Ruby, et al.) cluster away from the extreme corner of structured and immediate. They are closer to the center.

The result is (roughly) a "main sequence" of programming languages, similar to the main sequence astronomers see in the types of stars. Programming languages tend to a moderate zone, where trade-offs are made between structure and immediacy.

The unusual entry was the spreadsheet, which I consider a programming language for this exercise. It appears in the extreme corner for unstructured and immediate. The spreadsheet, as a programming environment, is the fastest thing we have. Enter a value or a formula in a cell and the change "goes live" immediately. ("Before your finger is off the ENTER key", as a colleague would say.) This is faster than any IDE or compiler or interpreter for any other language.

Spreadsheets are also unstructured. There are no structures in spreadsheets, other than multiple sheets for different sets of data. While it is possible to carefully organize data in a spreadsheet, there is nothing that mandates the organization or even encourages it. (I'm thinking about the formulas in cells. A sophisticated macro programming language is a different thing.)

I think spreadsheets took over a specific type of computing. They became the master of immediate, unstructured programming. BASIC and Forth could not compete with them, and no language since has tried to compete with the spreadsheet. The spreadsheet is the most effective form of this kind of computing, and I see nothing that will replace it.

Therefore, we can predict that spreadsheets will stay with us for some time. It may not be Microsoft Excel, but it will be a spreadsheet.

We can also predict that programming languages will stay within the main sequence of compromise between structure and immediacy.

In other words, BASIC is not going to make a comeback. Nor will Forth, regrettably.

Monday, January 16, 2017

Discipline in programming

Programming has changed over the years. We've created new languages and added features to existing languages. Old languages that many consider obsolete are still in use, and still changing. (COBOL and C++ are two examples.)

Looking at individual changes, it is difficult to see a general pattern. But stepping back and getting a broader view, we can see that the major changes have increased discipline and rigor.

The first major change was the use of high-level languages in place of assembly language. Using high-level languages provided some degree of portability across different hardware (one could, theoretically, run the same FORTRAN program on IBM, Honeywell, and Burroughs mainframes). It meant a distant relationship with the hardware and a reliance on the compiler writers.

The next change was structured programming. It changed our notions of flow control, using "while", "if/then/else", and "for" structures and discouraged the use of "goto".

Then we adopted relational databases, separate from the application program. It required using an API (later standardized as SQL) rather than accessing data directly, and it required thought and planning for the database.

Relational databases forced us to organize data stored on disk. Object-oriented programming forced us to organize data in memory. We needed object models and for very large projects, separate teams to manage the models.

Each of these changes added discipline to programming. The shift to compilers required reliable compilers and reliable vendors to support them. Structured programming applied rigor to the sequence of computation. Relational databases applied rigor to the organization of data stored outside of memory, that is, on disk. Object-oriented programming applied rigor to the organization of data stored in memory.

I should note that each of these changes was opposed. Each had naysayers, usually basing their arguments on performance. And to be fair, the initial implementation of each change did have lower performance than the old way. Yet each change has a group of advocates (I call them "the Pascal crowd" after the early devotees to that language) who pushed for the change. Eventually, the new methods were improved and accepted.

The overall trend is towards rigor and discipline. In other words, the Pascal crowd has consistently won the debates.

Which is why, when looking ahead, I think future changes will keep moving in the direction of rigor and discipline. There may be minor deviations from this path, with new languages introducing undisciplined concepts, but I suspect that they will languish. The successful languages will require more thought, more planning, and prevent more "dangerous" operations.

Functional programming is promising. It applies rigor to the state of our program. Functional programming languages use immutable objects, which once made cannot be changed. As the state of the program is the sum of the state of all variables, functional programming demands more thought given to the state of our system. That fits in with the overall trend.

So I expect that functional languages, like structured languages and object-oriented languages, will be gradually adopted and their style will be accepted as normal. And I expect more changes, all in the direction of improved rigor and discipline.

Thursday, July 21, 2016

Spaghetti in the Cloud

Will cloud computing eliminate spaghetti code? The question is a good one, and the answer is unclear.

First, let's understand the term "spaghetti code". It is a term that dates back to the 1970s according to Wikipedia and was probably an argument for structured programming techniques. Unstructured programming was harder to read and understand, and the term introduced an analogy of messy code.

Spaghetti code was bad. It was hard to understand. It was fragile, and small changes led to unexpected failures. Structured programming was, well, structured and therefore (theoretically) spaghetti programming could not occur under the discipline of structured programming.

But theory didn't work quite right, and even with the benefits of structured programming, we found that we had code that was difficult to maintain. (In other words, spaghetti code.)

After structured programming, object-oriented programming was the solution. Object-oriented programming, with its ability to group data and functions into classes, was going to solve the problems of spaghetti code.

Like structured programming before it, object-oriented programming didn't make all code easy to read and modify.

Which brings us to cloud computing. Will cloud computing suffer from "spaghetti code"? Will we have difficult to read and difficult to maintain systems in the cloud?

The obvious answer is "yes". Companies and individuals who transfer existing (difficult to read) systems into the cloud will have ... difficult-to-understand code in the cloud.

The more subtle answer is... "yes".

The problems of difficult-to-read code is not the programming style (unstructured, structured, or object-oriented) but in mutable state. "State" is the combination of values for all variables and changeable entities in a program. For a program with mutable state, these variables change over time. For one to read and understand the code, one must understand the current state, that is, the current value of all of those values. But to know the current value of those variables, one must understand all of the operations that led to the current state, and that list can be daunting.

The advocates of functional programming (another programming technique) doesn't allow for mutable variables. Variables are fixed and unchanging. Once created, they exist and retain their value forever.

With cloud computing, programs (and variables) do not hold state. Instead, state is stored in databases, and programs run "stateless". Programs are simpler too, with a cloud system using smaller programs linked together with databases and message queues.

But that doesn't prevent people from moving large, complicated programs into the cloud. It doesn't prevent people from writing large, complicated programs in the cloud. Some programs in the cloud will be small and easy to read. Others will be large and hard to understand.

So, will spaghetti code exist in the cloud? Yes. But perhaps not as much as in previous technologies.

Wednesday, May 6, 2015

Cloud apps may not need OOP

The rise of different programming styles follows the need for program sizes.

The earliest programs were small -- tiny by today's standards. Most would fit on a single printed page; the largest took a few pages.

Programs larger than a few pages quickly became a tangled mess. Structured Programming (SP) was a reaction to the need for larger programs. It was a technique to organize code and allow programmers to quickly learn an existing system. With SP we saw languages that used structured techniques: Pascal and C became popular, and Fortran and BASIC changed to use the structured constructs IF-THEN-ELSE and DO-WHILE.

Structured Programming was able to organize code up to a point, but it could not manage the large systems of the 1990s. Object-oriented programming (OOP) was a reaction to the need for programs larger than several hundred printed pages. With OOP we saw languages that used object-oriented techniques: Java and C# became popular, C mutated into C++, and Pascal mutated into ObjectPascal. These new languages (and new versions of old languages) used the object-oriented constructs of encapsulation, inheritance, and polymorphism.

Cloud computing brings changes to programming, but in a new way. Instead of larger programs, cloud computing allows for (and encourages) smaller programs. The need for large, well-organized programs has been replaced by a need for well-organized systems of small programs. In addition, the needs placed on the small programs are different from the needs of the old, pre-cloud programs: cloud programs must be fast, replicable, and substitutable. The core idea of cloud computing is a that a number of servers are ready to respond to requests and that any server (of a given class) can handle your request -- you don't need a specific server.

In this environment, object-oriented programming is less useful. It requires some overhead -- for the design or programs and at run time. Its strength is to organize code for large programs, but it offers little for small programs. I expect that people will move away from OOP languages for cloud systems, and towards languages than emphasize readability and reliability.

I don't expect a renaissance of Structured Programming. I don't expect anyone to move back to the older SP-inspired languages of Pascal and Fortran-77. Cloud computing may be the technology that pushes us to move to the "Functional Programming" style. Look for cloud-based applications to use functional languages such as Haskell and Erlang. (Maybe F#, for Microsoft shops.)

Thursday, April 18, 2013

Excel is the new BASIC

BASIC is a language that, to quote Rodney Dangerfield, gets no respect.

Some have quipped that "those whom the gods would destroy... they first teach BASIC".

COBOL may be disparaged, but only to a limited extent. People, deep down, know that COBOL is running many useful system. (Things like banking, airline reservations, and perhaps most importantly payroll.) COBOL does work, and we respect it.

BASIC, on the other hand, tried to be useful but never really made it. Despite Microsoft's attempt with its MBASIC product, Digital Research with its CBASIC compiler, Digital Equipment Corporation with its various implementations of BASIC, and others, BASIC was always second place to other programming languages. For microcomputers, those languages were assembly language, Pascal, and C.

(I'm limiting this to the interpreter BASIC, the precursor to Visual Basic. Microsoft's Visual Basic was capable and popular. It was used for many serious applications, some of which are probably still running today.)

BASIC's challenge was its design. It was a language for learning the concepts of programming, not building large, serious programs. The name itself confirms this: Beginner's All-purpose Symbolic Instruction Code.

More than the name, the constructs of the programming language are geared for small programs. This is due to the purpose of BASIC (a better FORTRAN for casual users) and the timing of BASIC (the nascent "structured programming" movement had yet to prove itself).

Without the constructs structured programming ("while" loops and "if/then/else" statements), programmers must either build their programs with structured concepts made of smaller elements, or build unstructured programs. BASIC allows you to build structured programs, but provides no assistance. Worse, BASIC relies on GOTO to build most control flows.

In contrast, modern programming languages such as Java, C#, Python, and Ruby provide the constructs for structured programming and don't offer the GOTO statement.

The people who learned to program in BASIC (and I am one of them) learned to program poorly, and we have paid a heavy price for it.

But what does this have to do with Microsoft Excel?

Excel is the application taught to people for managing data. (Microsoft Word is suitable for documents, and Powerpoint is suitable for presentations, but Excel is *the* application for data. I suspect more people know and use Excel than all of the people using Word, Powerpoint, and Access.)

Excel offers the same undisciplined approach to applications. Spreadsheets contain data and formulas (and VBA macros, but I will ignore those for now).

One might argue that Excel is a spreadsheet, different from a programming language such as BASIC. Yet the differences are small. Excel, with its formulas alone, is a programming system if not a language.

The design of Excel (and other spreadsheets, going back to Visicalc) provides no support for structure or discipline. Formulas can collect data from anywhere in the spreadsheet. There is no GOTO keyword, but one can easily build a tangled mess.

Microsoft Excel is the new BASIC: useful, popular, and undisciplined. Worse than BASIC, since Excel is the premier tool for manipulating data. BASIC, for all of its flaws, was always second to some other language.

In one way, Excel is not as bad as BASIC. Formulas may collect data from any location in the spreadsheet, but they (for the most part) modify only their own contents. This provides a small amount of order to spreadsheet-programs.

We need a new paradigm for data management. Just as programming had its "structured programming" movement which lead to the use of constructs that improved the reliability and readability of programs, spreadsheets need a new approach to the organization of data and the types of formulas that can be used on that data.

Sunday, October 7, 2012

How I fix old code: I use the wisdom of those before me

I am often called in to a project to improve the code -- to make it more efficient, or more maintainable.  It seems that some programmers write code that is difficult to understand. (And luckily for me, a large number of them.)

I use the maxims provided by Kernighan and Plauger in their book "The Elements of Programming Style". Written in the 1970s, it contains wisdom that can be used by programmers today.

One of my favorites (possibly because so many programmers do not follow it) is "Each function should do one thing well."

Actually, Kernighan and Plauger wrote "Each module should do one thing well"; I am taking a (reasonable, in my mind) liberty to focus on functions. With today's object-oriented programming languages, the meaning of "module" is somewhat vague. Does it refer to the class (data, member functions, and all?) or does it refer to a separate compiled file (which may contain a single class, multiple classes, a portion of a class, or all three)? But such questions are distractions.

When I write code, I strive to make each function simple and easy to understand. I try to build functions of limited size. When I find that I have written a long function, I re-factor it into a set of smaller functions.

But these techniques are not used by all programmers. I have encountered no small number of programs which contain unreadable code. Some programs have large functions. Some programs have poor names of variables and functions. Some programs have complicated interactions between functions and data, otherwise known as "shared mutable state". It is these programs that I can improve.

Many times, I find that the earlier programmers have done a lot of the work for me, by organizing the code into reasonable chunks that just happen to be grouped into a single function. This makes it easy for me to create smaller functions, each doing one thing well.

I do more than reduce functions to maintainable sizes. I also move functions to their proper class. I find many functions have been placed in improper classes. Moving them to the right class makes the code simpler.

How does one know the proper class? I use this rule: When the function modifies data, the class that holds the data is the class that should hold the function. (In other words, only class functions can modify class data. Functions in cannot modify data in other classes.)

Functions that do not modify data but only read data belong to the class that holds the data.

If a function reads data from two classes, it is most likely doing too much and should be re-factored. If a function is changing data in two classes, it is definitely doing too much, and should definitely be re-factored. (These rules are for direct access of data. I have no problem with functions that invoke methods on multiple objects to retrieve or modify their data.)

These two simple rules ("each function should do one thing well" and "each function in its proper class"), give me guidance for most of my improvements. They have served me well.

I succeed because I simplify code. I simplify code by using not my own rules, but the maxims laid out by those who came before me.

Wednesday, September 19, 2012

How to improve spreadsheets

Spreadsheets are the sharks of the IT world. They evolved before the IBM PC (the first spreadsheet ran on an Apple II) and have, for the most part, remain unchanged. They have migrated from the 6502 processor to the 8080, the Z-80, the 8086, and today's set. They have added graphs and fonts and extravagant file formats. They have grown from 256 rows to a rather large number. But they remain engines of data and formulas, with the data in well-defined grids and formulas with well-defined characteristics. They are sharks, evolved to a level of efficiency that has yet to be surpassed.

Spreadsheets get a number of things right:

- The syntax is easy to learn and consistent
- The data types are limited: numeric values, string values, and formulas
- Feedback for changes is immediate (no compiling or test scripts)
- Cells can only read values from other cells; they cannot assign a value to another cell

Immediate feedback is important. Teams using dynamic languages are at risk of a slew of problems; they use automated tests to identify errors. Agile processes emphasize the frequent use of tests. Spreadsheets provide this feedback without the need for tests.

The last item is most important. Cells can assign values to themselves, but they cannot assign values to other cells. This means that spreadsheets have no shared mutable state, no update collisions, no race conditions. That gives them stability.

Yet spreadsheets get a number of things wrong:

- All data is global
- Organization of data is dependent on the composer's skills and discipline

Our current spreadsheets are like the C language: fast, powerful, and dangerous. In C, one can do just about anything with the underlying machine. The C language lets you convert data from one form to another, and point to just about anything. It is quite powerful, but you have to know what you are doing.

Spreadsheets are not that dangerous. They don't have pointers, and the only things you can reference are (type safe because they are all of one type) cells within the spreadsheet.

But spreadsheets have the element of "you have to know what you are doing". The global nature of the data allows for formulas to refer to any cell (initialized or not) with little or no warning about nonsensical operations. In this sense, spreadsheet programming (in formulas) is much like C.

At first, I thought that the concepts of structured programming would improve spreadsheets. This is a false lead. Structured programming organizes code into sequences, iterations, and alternate paths. It clarifies code, but the formulas in spreadsheets are not a Turing-complete programming language. Structured programming can offer little to spreadsheets.

Instead, I think the concept of data encapsulation (from object-oriented programming) may help us advance the spreadsheet.

Spreadsheet authors tend to organize data into ranges. They may provide names for these ranges or leave them unnamed, but they will cluster their data into areas. Subsections of the grid will be used for specific types of data (for example, contact names in one range, regional sales in another).

For small spreadsheets, the "everything on one grid" concept works. Larger spreadsheets can see data split across pages (or tabs, depending on your spreadsheet manufacturer).

The problem with the spreadsheet grid is that it is, up to a point, infinite. We can add data to it without concern for the organization or structure. This becomes a problem over time; after a number of updates and revisions the effort to keep data organized becomes large.

An advanced spreadsheet would recognize that data is not stored in grids but in ranges, and would provide ranges as the key building block. Current spreadsheets let you define ranges, but the ability to operate on ranges is limited. In my new species of spreadsheet, the range would be the organizational unit, and ranges would not be infinite, empty grids. (They could expand, but only as a result of conscious action.)

Ranges are closer to tables in a database. Just as one can define a table and provide data for that table, one could define a range and provide data for that range. Unlike databases, ranges can be easily extended horizontally (more columns), re-sequenced, formatted, and edited. Unlike grids, ranges can be separated or re-combined to build new applications. Ranges must provide for local addresses (within the range) and external addresses (data within other ranges). Formulas must be able to read values from the current range and also from other grids.

If we do it right, ranges will be able to live in the cloud, being called in when needed and stored when not. Ranges will also be members of one application (or multiple applications), serving data to whatever application needs it.

Any improved spreadsheet will have to retain the advantages of the current tools. The immediacy of spreadsheets is a big advantage, allowing users of all skill levels to become proficient in a short time. Changing from grid-based spreadsheets to range-based spreadsheets must allow for this immediacy. This is a function of the UI, something that must be designed carefully.

I think that this new form of spreadsheet it possible, and offers some advantages. Now all I need is some time to implement it.

Tuesday, August 28, 2012

The deception of C++'s 'continue' and 'break'

Pick up any C++ reference book, visit any C++ web site, and you will see that the 'continue' and 'break' keywords are grouped with the loop constructs. In many ways it makes sense, since you can use these keywords with only those constructs.

But the more I think about 'continue' and 'break', the more I realize that they are not loop constructs. Yes, they are closely associated with 'while' and 'for' and 'case' statements, but they are not really loop constructs.

Instead, 'continue' and 'break' are variations on a different construct: the 'goto' keyword.

The 'continue' and 'break' statements in loops bypass blocks of code. 'continue' transfers control to the end of the loop block and allows the next iteration to continue. 'break' transfers control to the end of the loop block and forces the loop to end (allowing code after the loop to execute). These are not loop operations but 'transfer of control' operations, or 'goto' operations.

Now, modern programmers have declared that 'goto' operations are evil and must never, ever be used. Therefore, 'continue' and 'break', as 'goto' in disguise, are evil and must never, ever be used.

(The 'break' keyword can be used in 'switch/case' statements, however. In that context, a 'goto' is exactly the construct that we want.)

Back to 'continue' and 'break'.

If 'continue' and 'break' are merely cloaked forms of 'goto', then we should strive to avoid their use. We should seek out the use of 'continue' and 'break' in loops and re-factor the code to remove them.

I will be looking at code in this light, and searching for the 'continue' and 'break' keywords. When working on systems, I will make their removal one of my metrics for the improvement of the code.