Thursday, December 6, 2018

Rebels need the Empire

The PC world is facing a crisis. It is a silent crisis, one that few people understand.

That crisis is the evil empire, or more specifically, the lack of an evil empire.

For the entire age of personal computers, we have had an evil empire. The empire changed over time, but there was always one. And that empire was the unifying force for the rebellion.

The first empire was IBM. Microcomputer enthusiasts were fighting this empire of large, expensive mainframe computers. We fought it with small, inexpensive (compared to mainframes) computers. We offered small, interactive, "friendly" programs written in BASIC in opposition to batch mainframe systems written in COBOL. The rebellion used Apple II, TRS-80, and other small systems to unite and fight for liberty. This rebellion was successful. So successful that IBM decided to get in on the personal computer action.

The second empire was also IBM. The IBM PC became the standard for computing, and the diverse set of computers prior to the IBM model 5150 was wiped out. Rebels refused to use IBM PCs and attempted to keep non-PC-compatible computers financially viable. That struggle was lost, and the IBM design became the standard design. Once Compaq introduced a PC-compatible (and didn't get sued) other manufacturers introduced their own PC compatibles. The one remnant of this rebellion was Apple, who made non-compatible computers for quite some time.

The third empire was Microsoft. The makers of IBM-compatible PCs needed an operating system and Microsoft was happy to sell them MS-DOS. IBM challenged Microsoft with OS/2 (itself a joint venture with Microsoft) but Microsoft introduced Windows and made it successful. Microsoft was so successful that its empire was, at times, considered larger and grander than IBM mainframe empire. The rebellion against Microsoft took some time to form, but it did arise as the "open source" movement.

But Microsoft has fallen from its position as evil empire. It still holds a majority of desktop computer operating systems, but the world of computing has expanded to web servers, smartphones, and cloud systems, and these are outside of Microsoft's control.

In tandem with Microsoft's decline, open source has become accepted as the norm. As such, it is no longer the rebellion. The software market exists in tripartite: Windows, macOS, and Linux. Each is an acceptable solution.

Those two changes -- Microsoft no longer the evil empire and open source no longer the rebellion -- mean that, at the moment, there is no evil empire.

Some companies have large market shares of certain segments. Amazon.com dominates the web services and cloud market -- but competitors are reasonable and viable options. Microsoft dominates the desktop market, especially the corporate desktop market, but Apple is a possible choice for the corporate desktop.

No one vendor controls the hardware market.

Facebook dominates in social media, but is facing significant challenges in areas of privacy and "fake news". Other media channels like Twitter are looking to gain at Facebook's expense.

Even programming languages have no dominant player. According to the November report from Tiobe, Java and C have been the two most popular languages and neither is gaining significantly. The next three (C++, Python, and VB.net) are close, as are the five following (C#, JavaScript, PHP, SQL, and Go). No language is emerging as a dominant language, as we had with BASIC in the 1980s and Visual Basic in the 1990s.

A world without an evil empire is a new world for us. Personal computers were born under an evil empire, operating systems matured under an evil empire, and open source became respectable under an evil empire. I like to think that such innovations were driven (or at least inspired) by a rebellion, an active group of people who rejected the market leader.

Today we have no such empire. Will innovation continue without one? Will we see new hardware, new programming languages, new tools? Or will the industry stagnate as major plays focus more on market share and less on innovation?

If the latter, then perhaps someday a new market leader will emerge, strong enough to win the title of "evil empire" and rebels will again drive innovation.

Thursday, November 8, 2018

Why is there still a MacBook Air?

This week, Apple introduced upgrades to a number of its products. They showed a new Mac Mini and a new MacBook Air. The need for a new Mac Mini I understand. The need for a new MacBook Air I do not.

The original MacBook Air was revolutionary in that it omitted a CD/DVD reader. So revolutionary that Apple needed a way for a MacBook Air to "borrow" a CD/DVD reader from another computer (another Apple computer) to install software.

The MacBook Air stunned the world with its thinness and its low weight -- hence the adjective "Air". Compared to laptops of the time, even Apple's MacBooks, the MacBook Air was almost weightless.

But that was then. This is now.

Apple has improved the MacBook (without the "Air") to the point that MacBooks and MacBook Airs are indistinguishable. They are both thin. They are both lightweight. They both have no CD/DVD reader.

Yes, there are some minor points and one can tell a MacBook from a MacBook Air. MacBooks are slightly smaller and have only one USB C port, whereas MacBook Airs are larger and have multiple ports.

But in just about every respect, the MacBook Air is a new and improved MacBook. When you consider the processor, the memory and storage, the display, and the capabilities of the two devices, the MacBook Air is simply another member of the MacBook line. So why keep it? Why not just call it a MacBook?

Apple could certainly have two MacBooks. They have two MacBook Pro computers, a 13-inch model and a 15-inch model. They could have a 12-inch MacBook and a 13-inch MacBook. Yet they keep the "Air" designation. Why?

Its possible that the "MacBook Air" name has good market recognition, and Apple wants to leverage that. If so, we can expect to see other "Air" products, much like the iPad Air.

Monday, October 29, 2018

IBM and Red Hat Linux

The news that IBM had an agreement to purchase Red Hat (the distributor and supporter of a Linux distro for commercial use) was followed quickly by a series of comments from the tech world, ranging from anger to disappointment.

I'm not sure that the purchase of Red Hat by IBM is a bad thing.

One can view this event in the form of two questions. The first is "Should Red Hat sell itself (to anyone)?". The second is "Given that Red Hat is for sale, who would be a good purchaser?".

The negative reaction, I think, is mostly about the first question. People are disappointed (or angered) by the sale of Red Had -- to anyone.

But once you commit to a sale, the question changes and the focus is on the buyer. Who are possible buyers for Red Hat?

IBM is, of course, a possibility. Many people might object to IBM, and if we think of the IBM from its monopoly days and its arrogance and incompatible hardware designs, then IBM would be a poor choice. (Red Hat would also be a poor acquisition for that IBM, too.)

But IBM has changed quite a bit. It still sells mainframes; its S/36 line has mutated into servers, and it has sold (long ago) its PC business. It must compete in the cloud arena with Amazon.com, Microsoft, and Google (and Dell, and Oracle, and others). Red Hat helps IBM in this area. I think IBM is not so foolish as to break Red Hat or make many changes.

One possibility is that IBM purchased Red Hat to prevent others from doing so. (You buy something because you need it or because you want to keep it from others.) Who are the others?

Amazon.com and Microsoft come quickly to mind. They both offer cloud services, and Red Hat would help both with their offerings. The complainers may consider this; would they prefer Red Hat to go to Amazon or Microsoft? (Of the two, I think Microsoft would be the better owner. It is expanding its role with Linux and moving its business away from Windows and Windows-only software to a larger market of cloud services that support both Windows and Linux.)

There are other possible purchasers. Oracle has been mentioned by critics (usually as a "could be worse, could be Oracle" comment). Red Hat fills a gap in Oracle's product line between hardware and its database software, and also provides a platform for Java (another Oracle property).

Beyond those, there are Facebook, Dell, and possibly Intel, although I consider the last to be a long shot. None of them strike me as a good partner.

Red Hat could be purchased by an equity/investment company, which would probably doom Red Hat to partitioning and sales of individual components.

In the end, IBM seems quite a reasonable purchaser. IBM has changed from its old ways and it supports Linux quite a bit. I think it will recognize value and strive to keep it. Let's see what happens.

Tuesday, October 23, 2018

It won't be Eiffel

Bertrand Meyers has made the case, repeatedly, for design-by-contract as a way to improve the quality of computer programs. He has been doing so for the better part of two decades, and perhaps longer.

Design-by-contract is a notion that uses preconditions, postconditions, and invariants in object-oriented programs. Each is a form of an assertion, a test that is performed at a specific time. (Preconditions are performed prior to a function, postconditions after, etc.)

Design-by-contract is a way of ensuring that programs are correct. It adds rigor to programs, and  requires careful analysis and thought in the design of software. (Much like structured programming required analysis and thought for the design of software.)

I think it has a good chance of being accepted as a standard programming practice. It follows the improvements we have seen in programming languages: Bounds checking of indexes for arrays, function signatures, and type checking rules for casting from one type to another.

Someone will create a language that uses the design-by-contract concepts, and the language will gain popularity. Perhaps because of the vendor (Microsoft? Google?) or perhaps through grass-roots acceptance (a la Python).

There already is a language that implements design-by-contract: Eiffel, Meyers' pet language. It is available today, even for Linux, so developers can experiment with it. Yet it has little interest. The Eiffel language does not appear on the Tiobe index (at least not for September 2018) at all -- not only not in the top 20, but not in the top 100. (It may be lurking somewhere below that.)

So while I think that design-by-contract may succeed in the future, I also think that Eiffel has missed its opportunity. It hasn't been accepted by any of the big vendors (Microsoft, Oracle, Google, Apple) and its popularity remains low.

I think that another language may pick up the notion of preconditions and postconditions. The term "Design by contract" is trademarked by Meyers, so it is doubtful that another language will use it. But the term is not important -- it is the assertions that bring the rigor to programming. These are useful, and eventually will be found valuable by the development community.

At that point, multiple languages will support preconditions and postconditions. There will be new languages with the feature, and adaptations of existing languages (C++, Java, C#, and others) that sport preconditions and postconditions. So Bertrand Meyer will have "won" in the sense that his ideas were adopted.

But Eiffel, the language, will be left out.

Tuesday, October 9, 2018

C without the preprocessor

The C and C++ languages lack one utility that is found in many other languages: a package manager. Will they ever have one?

The biggest challenge to a package manager for C or C++ is not the package manager. We know how to build them, how to manage them, and how to maintain a community that uses them. Perl, Python, and Ruby have package managers. Java has one (sort of). C# has one. JavaScript has several! Why not C and C++?

The issue isn't in the C and C++ languages. Instead the issue is in the preprocessor, an external utility that modifies C or C++ code before the compiler does its work.

The problem with the preprocessor is that it can change just about any token in the code to something else, including statements which would be used by package managers. The preprocessor can change "do_this" to "do_that" or change "true" to "TRUE" or change "BEGIN" to "{".

The idea of a package manager for C and C++ has been discussed, and someone (I forget the person now) listed a number of questions that the preprocessor raises for a package manager. I won't repeat the list here, but they were very good questions.

To me, it seems that a package manager and a preprocessor are incompatible. If you have one, you cannot have the other. (At least, not with any degree of consistency.)

So I started thinking... what if we eliminate the C/C++ preprocessor? How would that change the languages?

Let's look at what the preprocessor does for us.

For starters, it is the mechanism to include headers in programs. The "#include" lines are handled by the preprocessor, not the compiler. (When C was first designed, a preprocessor was considered a "win", as it separated some tasks from the compiler and followed the Unix philosophy of separation of duties.) We still need a way to include definitions of constants, functions, structures, and classes, so we need a replacement for the #include command.

A side note: C and C++ standard wonks will know that it is not required that the preprocessor and not the compiler handle "#include" lines. The standards dictate that after certain lines (such as #include "string") the compiler must exhibit certain behaviors. But this bit of arcane knowledge is not important to the general idea of elminating the preprocessor.

The preprocessor allows for conditional compilation. It allows for "#if/#else/#endif" blocks that can be conditionally compiled, based on what follows the "#if". Conditional compilation is extremely useful on software that has multiple targets, such as the Linux kernel (which targets many different processors).

The preprocessor also allows for macros and substitution of values. It accepts a "#define" line which can change any token into something else. This mechanism was used for the "max()" and "min()" functions.

All of that would be lost with the elimination of the preprocessor. As all of those features are used on many projects, they would all have to be replaced by some form of extension to the compiler. The compiler would have to read the included files, and would have to compile (or not compile) conditionally-marked code.

Such a change is possible, but not easy. It would probably break a lot of existing code -- perhaps all nontrivial C and C++ programs.

Which means that removing the preprocessor from C and C++ and replacing it with something else is a change to the language that makes C and C++ no longer C and C++. Removing the preprocessor changes the languages. They are no longer C and C++, but different languages, and deserving of different names.

So in once sense you can remove the preprocessor from C and C++, but in another sense you cannot.

Friday, September 28, 2018

Macbooks are not an incentive

I've seen a number of job postings that include the line "all employees use MacBooks".

I suppose that this is intended as an enticement. I suppose that a MacBook is considered a "perk", a benefit of working at the company. Apple equipment is considered "cool", for some reason.

I'm not sure why.

MacBooks in 2018 are decent computers, but I find that they are inferior to other computers, especially when it comes to development.

I've been using computers for quite some time, and programming for most of that time. I've used MacBooks and Chromebooks and modern PCs. I've used older PCs and even ancient PCs with IBM's Model M keyboard. I've worked on IBM's System/23 (which was the origin of the first IBM PC keyboard). I have even used model 33 ASR Teletype terminals, which are large mechanical beasts that print uppercase on roll paper and do a poor job of it. So I know what I like.

And I don't like Apple's MacBook and MacBook Pro computers. I dislike the keyboard; I want more travel in the keys. I dislike the touchpad in front of the keyboard; I prefer the small pointing stick embedded in Lenovo and some Dell laptop keyboards. I dislike Apple's displays, which are too bright and too reflective. I want "matte" finish displays which hide reflections from light sources such as windows and ceiling lights.

My main client provides a computer, one that I must use when working for them. The computer is a Dell laptop, with a high-gloss display and a keyboard that is a bit better than current Apple keyboards, but not by much. I supplement the PC with a matte-finish display and a Matias "Quiet Pro" keyboard. These make the configuration much more tolerable.

Just as I "fixed" the Dell laptop, I could "fix" a MacBook Pro with an additional keyboard and display. But once I do that, why bother with the MacBook? Why not use a Mac Mini, or for that matter any small-factor PC? The latter would probably offer just as much memory and disk, and more USB ports. And cost less. And run Linux.

It may be some time before companies realize that developers have strong opinions about the equipment that they use. I think that they will, and when they do, they will provide developers with choices for equipment -- including the "bring your own" option.

And it may be some time before developers realize that Apple MacBooks are not the best for development. Apple devices have a lot of glamour, but glamour doesn't get the job done -- at least not for me. Apple designs computers for visual appeal, and I need good ergonomic design.

I'm not going to forbid developers from using Apple products, or demand that everyone use the same equipment that I use. I will suggest that developers try different equipment, see which devices work for them, and understand the benefits of those devices. Pick your equipment for the right reasons, not because it has a pretty logo.

In the end, I find the phrase "all employees use MacBooks" to be a disincentive, a reason to avoid a particular gig. Because I would rather be productive than cool.

Tuesday, September 18, 2018

Programming languages and the GUI

Programming languages and GUIs don't mix. Of all the languages available today, none are GUI-based languages.

My test for a GUI-based language is the requirement that any program written in the language must use a GUI. If you can write a program and run it in a text window, then the language is not a GUI-based language.

This is an extreme test, and perhaps unfair. But it shows an interesting point: We have no GUI-based languages.

We had programming before the GUI with various forms of input and output (punch cards, paper tape, magnetic tape, disk files, printers, and terminals). When GUIs came along, we rushed to create GUI programs but not GUI programming languages. (Except for Visual Basic.) We still have GUIs, some thirty years on, and today we have no GUI programming languages.

Almost all programming languages treat windows (or GUIs) as a second thought. Programming for the GUI is bolted on to the language as a library or framework; it is not part of the core language.

For some languages, the explanation is obvious: the language existed before GUIs existed (or became popular). Languages such as Cobol, Fortran, PL/I, Pascal, and C had been designed before GUIs appeared on the horizon. Cobol and Fortran were designed in an era of magnetic tapes, disk files, and printers. Pascal and C were created for printing terminals or "smart" CRT terminals such as DEC's VT-52.

Some languages were designed for a specific purpose. Such languages have no need of GUIs, and they don't have any GUI support. AWK was designed as a text processing language, a filter that fit in with Unix's tool-chain philosophy. SQL was designed for querying databases (and prior to GUIs).

Other languages were designed after the advent of the GUI, and for general-purpose programming. Languages such as Java, C#, Python, and Ruby came to life in the "graphical age", yet graphics is an extension of the language, not part of the core.

Microsoft extended C++ with its Visual C++ package. The early versions were a daunting mix of libraries, classes, and #define macros. Recent versions are more palatable, but C++ remains C++ and the GUI parts are mere add-ons to the language.

Borland extended Pascal, and later provided Delphi, for Windows programming. But Object Pascal and Windows Pascal and even Delphi were just Pascal with GUI programming bolted on to the core language.

The only language that put the GUI in the language was Visual Basic. (The old Visual Basic, not the VB.NET language of today.) These languages not only supported a graphical display, they required it.

I realize that there may be niche languages that handle graphics as part of the core language. Matlab and R support the generation of graphics to view data -- but they are hardly general-purpose languages. (One would not write a word processor in R.)

Mathematica and Wolfram do nice things with graphics too, but again, for rendering numerical data.

There are probably other obscure languages that handle GUI programming. But they are obscure, not mainstream. The only other (somewhat) popular language that required a graphical display was Logo, and that was hardly a general-purpose language.

The only popular language that handled the GUI as a first-class citizen was Visual Basic. It is interesting to note that Visual Basic has declined in popularity. Its successor, VB.NET is a rough translation of C# and the GUI is, like other languages, something added to the core language.

Of course, programming (and system design) today is very different from the past. We design and build for mobile devices and web services, with some occasional web applications. Desktop applications are considered passe, and console applications are not considered at all (except perhaps for system administrators).

Modern applications place the user interface on a mobile device. The server provides services, nothing more. The GUI has moved from the desktop PC to the web browser and now to the phone. Yet we have no equivalent of Visual Basic for developing phone apps. The tools are desktop languages with extensions for mobile devices.

When will we get a language tailored to phones? And who will build it?

Wednesday, July 18, 2018

Another idea for Amazon's HQ2

Amazon has announced plans for an 'HQ2' a second head-quarters office. They have garnered attention in the press by asking cities to recommend locations (and provide benefits). The focus has been on which city will "win" the office ("winning" may be more expensive than cities realize) and on the criteria for Amazon's selection.

The question that no one has asked is: Why does Amazon want a second head-quarters?

Amazon projects growth over the next decade, and it will need employees in various capacities. But that does not require a second head-quarters. Amazon could easily expand with a number of smaller buildings, spread across the country. They could do it rather cheaply too, as there are a large number of available buildings in multiple cities. (Although the buildings may be in cities that are not where Amazon wants to locate its offices.)

Amazon also has the option to expand its workforce by using remote workers, letting employees work from home.

Why does Amazon want a single large building with so many employees? Why not simply buy (or lease) smaller buildings?

Maybe, just maybe, Amazon has another idea.

It is possible that Amazon is preparing to split into two companies. This would make some sense: Amazon has seen a lot of growth, and managing two smaller companies (with a holding company atop both) may have advantages.

The most likely split would be into one company for online and retail sales, and a second for its web and cloud services. These are two different operations, with different markets and different needs. Both operations are profitable, and Amazon does not need to subsidize one from the other.

Dividing into two companies gives some sense to a second head-quarters office. Once filled, Amazon could easily split into two companies, each with its own head-quarters. That could be Amazon's strategy.

I could, of course, be quite wrong about this. I have no relationship with Amazon, except as an occasional customer, so I have no privileged information.

But if they do split, I expect the effect on the market to be minimal. The two sides (retail and web services) are fairly independent.

Thursday, June 7, 2018

Cable channels and web accounts

Web accounts are like cable channels, in that too many can be a bad thing.

When cable TV arrived in my home town (many moons ago), the cable company provided two things: a cable into the house, and a magical box with a sliding channel selector. The channels ranged from 2 through 13, and then from 'A' to 'Z'. We dutifully dialed our TV to channel 3 and then selected channels using the magical box.

At first, the world of cable TV was an unexplored wilderness for us, with lots of channels we did not have with our older, over-the-air reception. We carefully moved from one channel to another, absorbing the new programs available to us.

The novelty quickly wore off, and soon we would "surf" through the channels, viewing each one quickly and then deciding on a single program to watch. It was a comprehensive poll of the available programs, and it worked with only 38 channels.

In a later age of cable TV, when channels were more numerous, I realized that the "examine every channel" method would not work. While you can skip through 38 (or even 100) channels fairly quickly, you cannot do the same for 500 channels. It takes some amount of time to peek at a single channel, and that amount of time adds up for large numbers of channels. (The shift from analog to digital slowed the progress, because digital cable lags as the tuner resets to the new channel.)

In brief, once you have a certain number of channels, the "surf all and select" method takes too long, and you will have missed some amount of the program you want. (With enough channels, you may have missed the entire program!)

What does all of this have to do with web accounts?

Web accounts have a similar effect, not related to surfing for content (although I suppose that could be a problem for a viewed a large number of web sites) but with user names and passwords. Web accounts need maintenance. Not daily, or weekly, or monthly, but they do need maintenance from time to time. Web accounts break, or impose new rules for passwords, or require that you accept a new set of terms and conditions.

For me, it seems that at any given time, one account is broken. Not always the same account, and sometimes the number is zero and sometimes the number is two, but on average it seems that I am fixing an account at least once per week. This week I have two accounts that need work. One is my Apple account. The other is with yast.com, who billed my annual subscription to two different credit cards.

With a handful of accounts, this is not a problem. But with a large number of accounts... you can see where this is going. With more and more accounts, you must spend more and more time on "maintenance". Eventually, you run out of time.

Our current methods of authentication are not scalable. The notion of independent web sites, each with its own ID and password, each with its own authentication mechanism, works for a limited number of sites, but contains an upper bound. That upper bound varies from individual to individual, based on their ability to remember IDs and passwords, and their ability to fix problems.

We will need a different authentication mechanism (or set of mechanisms) that is more reliable and simpler to operate.

We have the beginnings of standardized authentication. Facebook and Google logins are available, and many sites use them. There is also OAuth, an open source offering, which may also standardize authentication. It is possible that the US government will provide an authentication service, perhaps through the Postal Service. I don't know which will become successful, but I believe that a standard authentication mechanism is coming.

Tuesday, May 8, 2018

Refactor when you need it

The development cycle for Agile and TDD is simple:
  • Define a new requirement
  • Write a test for that requirement
  • Run the test (and see that it fails)
  • Change the code to make the test pass
  • Run the test (and see that it passes)
  • Refactor the code to make it clean
  • Run the test again (and see that it still passes)
Notice that refactor step near the end? That is what keeps your code clean. It allows you to write a messy solution quickly.

A working solution gives you a good understanding of the requirement, and its affect on the code. With that understanding, you can then improve the code, making it clear for other programmers. The test keeps your revised solutions correct -- if a cleanup change breaks a test, you have to fix the code.

But refactoring is not limited to after a change. You can refactor before a change.

Why would you do that? Why would you refactor before making any changes? After all, if your code is clean, it doesn't need to be refactored. It is already understandable and maintainable. So why refactor in advance?

It turns out that code is not always perfectly clean. Sometimes we stop refactoring early. Sometimes we think our refactoring is complete when it is not. Sometimes we have duplicate code, or poorly named functions, or overweight classes. And sometimes we are enlightened by a new requirement.

A new requirement can force us to look at the code from a different angle. We can see new patterns, or see opportunities for improvement that we failed to see earlier.

When that happens, we see new ways of organizing the code. Often, the new organization allows for an easy change to meet the requirement. We might refactor classes to hold data in a different arrangement (perhaps a dictionary instead of a list) or break large-ish blocks into smaller blocks.

In this situation, it is better to refactor the code before adding the new requirement. Instead of adding the new feature and refactoring, perform the operations in reverse sequence: refactor and then add the requirement. (Of course, you still test and you can still refactor at the end.) The full sequence is:
  • Define a new requirement
  • Write a test for that requirement
  • Run the test (and see that it fails)
  • Examine the code and identify improvements
  • Refactor the code (without adding the new requirement)
  • Run tests to verify that the code still works (skip the new test)
  • Change the code to make the test pass
  • Run the test (and see that it passes)
  • Refactor the code to make it clean
  • Run the test again (and see that it still passes)
I've added the new steps in bold.

Agile has taught us is to change our processes when the changes are beneficial. Changing the Agile process is part of that. You can refactor before making changes. You should refactor before making changes, when the refactoring will help you.

Saturday, April 21, 2018

Why the ACM is stuck in academe

The Association of Computing Machinery (ACM) is a professional organization. Its main web page claims it is "Advancing Computing as a Science & Profession". In 2000, it recognized that it was focussed exclusively on the academic world, and it also recognized that it had to expand. It has struggled with that expansion for the past two decades.

I recently found an example of its failure.

The flagship publication, "Communications of the ACM", is available on paper or on-line. (So far, so good.) It is available to all comers, with only some articles locked behind a paywall. (Also good.)

But the presentation is bland, almost stifling.

The Communications web site follows a standard, "C-clamp" layout with content and in the center and links and administrative items wrapped around it on the top, left, and bottom. An issue's table of contents has titles (links) with descriptions to the individual articles of the magazine. This is a reasonable arrangement.

Individual articles are presented with header and footer, but without the left-side links. They are not using the C-clamp layout. (Also good.)

The fonts and colors are appealing, and they conform to accessibility standards.

But the problem that shows how ACM fails to "get it" is with the comments. Their articles still have comments (which is good) but very few people comment. So few that many articles have no comments. How does ACM present an article with no comments? How do they convey this to the reader? With a single, mechanical phrase under the article text:

No entries found

That's it. Simply the text "no entries found". It doesn't even have a header describing the section as a comments section. (There is a horizontal rule between the article and this phrase, so the reader has some inkling that "no entries found" is somewhat distinctive from the article. But nothing indicating that the phrase refers to comments.)

Immediately under the title at the top of the page there is a link to comments (labelled "Comments") which is a simple intrapage link to the empty, unlabelled comments section.

I find phrase "no entries found" somewhat embarrassing. In the year 2018, we have the technology to provide text such as "no comments found" or "no comments" or perhaps "be the first to comment on this article". Yet the ACM, the self-proclaimed organization that "delivers resources that advance computing as a science and a profession" cannot bring itself to use any of those phrases. Instead, it allows the underlying CMS driving its web site to bleed out to the user.

A darker thought is that the ACM cares little for comments. It knows that it has to have them, to satisfy some need for "user engagement", but it doesn't really want them. That philosophy is consistent with the academic mindset of "publish and cite", in which citations to earlier publications are valued, but comments from random readers are not.

Yet the rest of the world (that is, people outside of academe) care little for citations and references. They care about opinions and information (ad profits). Comments are an ongoing problem for web sites; few are informative and many are insulting, and many web sites have abandoned comments.

ACM hasn't disabled its comments, but it hasn't encouraged them either. It sits in the middle.

This is why the ACM struggles with its outreach to the non-academic world.

Thursday, April 19, 2018

Why no language to replace SQL?

The history of programming is littered with programming languages. Some endure for ages (COBOL, C, Java) and some live briefly (Visual J++). We often develop new languages to replace existing ones (Perl, Python).

Yet one language has endured and has seen no replacements: SQL.

SQL, invented in the 1970s and popularized in the 1980s, has lived a good life with no apparent challengers.

It is an anomaly. Every language I can think of has a "challenger" language. FORTRAN was challenged by BASIC. BASIC was challenged by Pascal. C++ was challenged by Java; Java was challenged by C. Unix shell programming was challenged by AWK, which in turn was challenged by Perl, which in turn has been challenged by Python.

Yet there have been no (serious) challengers to SQL. Why not?

I can think of several reasons:
  • Everyone loves SQL and no one wants to change it.
  • Programmers think of SQL as a protocol (specialized for databases) and not a programming language. Therefore, they don't invent a new language to replace it.
  • Programmers want to work on other things.
  • The task is bigger than a programming language. Replacing SQL means designing the language, creating an interpreter (or compiler?), command-line tools (these are programmers, after all), bindings to other languages (Python, Ruby, and Perl at minimum), and data access routines. With all features of SQL, including triggers, access controls, transactions, and audit logs.
  • SQL gets a lot of things right, and works.
I'm betting on the last. SQL, for all of its warts, is effective, efficient, and correct.

But perhaps there is a challenger to SQL: NoSQL.

In one sense, NoSQL is a replacement for SQL. But it is a replacement of more than the language -- it is a replacement of the notion of data structure. NoSQL "databases" store documents and photographs and other things, but they are rarely used to process transactions. NoSQL databases don't replace SQL databases, they complement them. (Some companies move existing data from SQL databases to NoSQL databases, but this is data that fits poorly in the relational structure. They move some of their data but not all of their data out of the SQL database. These companies are fixing a problem, not replacing the SQL language.)

NoSQL is a complement of SQL, not a replacement (and therefore not a true challenger). SQL handles part of our data storage and NoSQL handles a different part.

It seems that SQL will be with us for some time. It is tied to the notion of relational organization, which is a useful mechanism for storing and processing homogeneous data.

Wednesday, April 11, 2018

Big enough is big enough

Visual Studio has a macro capability, but you might never have used it. You might not even know that it exists.

You see, you cannot use it as Visual Studio comes "out of the box". The feature is disabled. You have to take action before you can use it.

First, there is a setting inside of Visual Studio to enable macros.

Second, there is a setting inside of Windows to allow Visual Studio macros. Only system administrators can enable it.

Yes, you read that right. There are two settings to enable macros in Visual Studio, and both must be enabled to run macros.

Why? I'm not sure, but my guess is that the Visual Studio setting was there all along, allowing macros if users wanted them. The second setting (inside Windows) was added later, as a security feature.

The second setting was needed because the macro language inside of Visual Studio is powerful. It can call Windows API functions, instantiate COM objects, and talk to .NET classes. All of this in addition to the expected "insert some text" and "move the insertion point" we expect of a text editor macro.

Visual Studio's macro language is the equivalent of an industrial-strength cleaning solvent: So powerful that it can be used only with great care. And one is always at risk of a malevolent macro, sent from a co-worker or stranger.

But macros don't have to be this way.

The Notepad++ program (the editor for Windows) is a text editor -- not an IDE -- and it has macro capabilities. Its macro capability is much simpler than that of Visual Studio: it records keystrokes and plays them back. It can do anything you, the user, can do in the program, and no more.

Which means, of course, that NotePad++'s macro capabilities are safe. They can do only the "normal" operations of a text editor.

And it also means that macros in Notepad++ are safe. It's not possible to create a malevolent macro -- or send or receive one. (I guess the most malicious macro could be a "select all, delete, save-file" macro. It would be a nuisance but little else.)

The lesson? Macros that are "powerful enough" are, well, powerful enough. Macros that are "powerful enough to do anything" are, um, powerful enough to do anything, including things that are dangerous.

Notepad++ has macros that are powerful enough to do meaningful work. Visual Studio has macros that can do all sorts of things, much more that Notepad++, and apparently so powerful that they must be locked away from the "normal" user.

So Notepad++, with its relatively small macro capabilities is usable, and Visual Studio, with its impressive and all-powerful capabilities (okay, that's a bit strong, but you get the idea) is *not* usable. Visual Studio's macros are too powerful for the average user, so you can't use them.

Something to think about when designing your next product.

Wednesday, April 4, 2018

Apple to drop Intel chips (or not)

The romance between Apple and Intel has come to an end.

In 2005, Apple announced that it was switching to Intel processors for its desktop and laptop computers. Previously it had used PowerPC chips, and the laptops were called "PowerBooks". The first Intel-based laptops were called "MacBooks".

Now, Apple has announced plans to design its own processors. I'm certain that the folks over at Intel are less than happy.

Looking forward, I think a number of people will be unhappy with this change, from open source advocates to developers to even Apple itself.

Open source advocates may find that the new Apple-processor MacBooks are unable to run operating systems other than Apple's, which means that Linux will be locked out of the (new) Apple hardware. While only a miniscule number of people actually replace macOS with Linux (disclosure: I'm one) those who do may be rather vocal about the change.

Apple MacBooks are popular with developers. (Exactly why this is the case, I am not sure. I dislike the MacBook's keyboard and display, and prefer other equipment for my work. But maybe I have preferences different from most developers.)

Getting back to developers: They like Apple MacBooks. Look inside any start-up or small company, and MacBooks dominate the office space. I'm sure that part of this popularity is from Apple's use of NetBSD (a Unix derivative) as the base for macOS, which lets MacBook users run most Linux software.

When Apple switches from Intel to its own (probably proprietary) processor, will those utilities be available?

The third group affected by this change will be Apple itself. They may find that the development of processors is harder than they expect, with delays and trade-offs necessary for performance, power efficiency, security, and interfaces to other system components. Right now, Apple outsources those headaches to Intel. Apple may not like the decisions that Intel makes (after all, Intel serves other customers and must accommodate their needs as well as Intel's) and it may feel that control over the design will reduce those headaches.

In-sourcing the design of processors may reduce headaches... or it may simply move them. If Apple has been dissatisfied with Intel's delivery schedule for new chips, the new arrangement may simple mean that Apple management will be dissatisfied with their internal division's delivery schedule for new chips. Owning the design process may give Apple more control over the process but not total control over it.

The move from standard, well-known processors to proprietary and possibly not well-understood processors moves Apple away from the general market and into their own space. Apple desktops and laptops may become proprietary and secret, with Apple processors and Apple systems-on-a-chip and Apple operating systems and Apple drivers and Apple software, ... and only Apple able to upgrade, repair, or modify them.

That's a bit of a longshot, and I don't know that it will happen. Apple management may find the idea appealing, hoping for increased revenue. But it is a move towards isolationism, away from the "free trade" market that has made PCs popular and powerful. It's also a move to the market before the IBM PC, when small computers were not commodities but very different from each other. I'm not sure that it will help Apple in the long run.


Tuesday, March 27, 2018

A mainframe technique for improving web app performance

I've been working on a modern web application, making changes to improve its performance. For one improvement, I used an old technique from the days of mainframe processing.

I call it the "master file update" algorithm, although others may use different names.

It was commonly used when mainframe computers read data from tapes (or even punch cards). The program was to update a master file (say, account numbers and balances) with transactions (each with account numbers and transaction amount). The master file could contain bank accounts, or insurance policies, or something similar.

Why did mainframes use this technique? In short, they had to. The master file was stored on a magnetic tape, and transactions were on another tape (or perhaps punch cards). Both files were sequential-access files, and the algorithm read each file's records in sequence, writing a new master file along the way. Sequential access is the only way to read a file on magtape.

How did I use this technique? Well, I wasn't reading magnetic tapes, but I was working with two sets of data, one a "master" set of values and the other a set of "transactions". The original (slow) algorithm stored both sets of data in dictionary structures and required access by key values. Each access required a lookup into the dictionary. While the access was handled by the data structure's class, it still required work to find the correct value for each update.

The revised algorithm followed the mainframe technique: store the values in lists, not dictionaries, and make a single pass through the two sets of data. Starting with the first master value and the first transaction value, walk forward through the master values until the keys match. When the keys match, update the value and advance to the next transaction value. Repeat until you reach the end of both lists.

Dictionaries are good for "random access" to data. When you have to read (or update) a handful of values, and your updates are in no particular order, a dictionary structure is useful. There is a cost, as the dictionary as to find the item you want to read or update. Our situation was such that the cost of the dictionary was affecting our performance.

Lists are simpler structures than dictionaries. They don't have to search for the data; you move to the item in the list and read its value. The result is that read and write operations are faster. The change for a single operation was small; multiplied by the number of operations the change was a modest, yet noticeable, improvement.

Fortunately for me, the data was easily converted to lists, and it was in the right sequence. The revised code was faster, which was the goal. And the code was still readable, so there was no downside to the change.

The lesson for the day? Algorithms are usually not dependent on technology. The "master file update" algorithm is not a "mainframe technique", suitable only for mainframe applications. It works quite well in web apps. Don't assume that you must use only "web app" algorithms for web apps, and only "cloud app" algorithms for cloud apps. Such assumptions can blind you to good solutions. Keep you options open, and use the best algorithm for the job.

Tuesday, March 13, 2018

Programming languages and character sets

Programming languages are similar, but not identical. Even "common" things such as expressions can be represented differently in different languages.

FORTRAN used the sequence ".LT." for "less than", normally (today) indicated by the sign <, and ".GT." for "greater than", normally the sign >. Why? Because in the early days of computing, programs were written on punch cards, and punch cards used a small set of characters (uppercase alpha, numeric, and a few punctuation). The signs for "greater than" and "less than" were not part of that character set, so the language designers had to make do with what was available.

BASIC used the parentheses to denote both function arguments and variable subscripts. Nowadays, most languages use square brackets for subscripts. Why did BASIC use parentheses? Because most BASIC programs were written on Teletype machines, large mechanical printing terminals with a limited set of characters. And -- you guessed it -- the square bracket characters were not part of that set.

When C was invented, we were moving from Teletypes to paperless terminals. These new terminals supported the entire ASCII character set, including lowercase letters and all of the punctuation available on today's US keyboards. Thus, C used all of the symbols available, including lowercase letters and just about every punctuation symbol.

Today we use modern equipment to write programs. Just about all of our equipment supports UNICODE. The programming languages we create today use... the ASCII character set.

Oh, programming languages allow string literals and identifiers with non-ASCII characters, but none of our languages require the use of a non-ASCII character. No languages make you declare a lambda function with the character λ, for example.

Why? I would think that programmers would like to use the characters in the larger UNICODE set. The larger character set allows for:
  • Greek letters for variable names
  • Multiplication (×) and division (÷) symbols
  • Distinct characters to denote templates and generics
C++ chose to denote templates with the less-than and greater-than symbols. The decision was somewhat forced, as C++ lives in the ASCII world. Java and C# have followed that convention, although its not clear that they had to. Yet the decision is has its costs; tokenizing source code is much harder with symbols that hold multiple meanings. Java and C# could have used the double-angle brackets (« and ») to denote generics.

I'm not recommending that we use the entire UNICODE set. Several glyphs (such as 'a') have different code points assigned (such as the Latin 'a' and the Cyrllic 'a') and having multiple code points that appear the same is, in my view, asking for trouble. Identifiers and names which appear (to the human eye) to be the same would be considered different by the compiler.

But I am curious as to why we have settled on ASCII as the character set for languages.

Maybe its not the character set. Maybe it is the equipment. Maybe programmers (and more specifically, program language designers) use US keyboards. When looking for characters to represent some idea, our eyes fall upon our keyboards, which present the ASCII set of characters. Maybe it is just easier to use ASCII characters -- and then allow UNICODE later.

If that's true (that our keyboard guides our language design) then I don't expect languages to expand beyond ASCII until keyboards do. And I don't expect keyboards to expand beyond ASCII just for programmers. I expect programmers to keep using the same keyboards that the general computing population uses. In the US, that means ASCII keyboards. In other countries, we will continue to see ASCII-with accented characters, Cyrillic, and special keyboards for oriental languages. I see no reason for a UNICODE-based keyboard.

If our language shapes our thoughts, then our keyboard shapes our languages.

Tuesday, March 6, 2018

My Technology is Old

I will admit it... I'm a sucker for old hardware.

For most of my work, I have a ten-year-old generic tower PC, a non-touch (and non-glare) 22-inch display, and a genuine IBM Model M keyboard.

The keyboard (a Model M13, to be precise) is the olds-tyle "clicky" keyboard with a built-in TrackPoint nub that emulates a mouse. It is, by far, the most comfortable keyboard I have used. It's also durable -- at least thirty years old and still going, even after lots of pounding. I love the shape of the keys, the long key travel (almost 4 mm), and the loud clicky sound on each keypress. (Officemates are not so fond of the last.)

For other work, I use a relatively recent HP laptop. It also has a non-glare screen. The keyboard is better than most laptop keyboards these days, with some travel and a fairly standard layout.

I prefer non-glare displays to the high-gloss touch displays. The high-gloss displays are quite good as mirrors, and reflect everything, especially lamps and windows. The reflections are distracting; non-glare displays prevent such disturbances.

I use an old HP 5200C flatbed scanner. Windows no longer recognizes it as a device. Fortunately, Linux does recognize it and lets me scan documents without problems.

A third workstation is an Apple Powerbook G4. The PowerBook is the predecessor to the MacBook. It has a PowerPC processor, perhaps 1GB of RAM (I haven't checked in a while), and a 40 GB disk. As a laptop, it is quite heavy, weighing more than 5 pounds. Some of the weight is in the battery, but a lot is in the case (aluminum), the display, and the circuits and components. The battery still works, and provides several hours of power. It holds up better than my old MacBook, which has a battery that lasts for less than two hours. The PowerBook also has a nicer keyboard, with individually shaped keys as opposed to the MacBooks flat keycaps.

Why do I use such old hardware? The answer is easy: the old hardware works, and in some ways is better than new hardware.

I prefer the sculpted keys of the IBM Model M keyboard and the PowerBook G4 keyboard. Modern systems have flat, non-sculpted keys. They look nice, but I buy keyboards for my fingers, not my eyes.

I prefer the non-glare screens. Modern systems provide touchscreens. I don't need to touch my displays; my work is with older, non-touch interfaces. A touchscreen is unnecessary, and it brings the distracting high-glare finish with it. I buy displays for my eyes, not my fingers.

Which is not to say that my old hardware is without problems. The PowerBook is so old that modern Linux distros can run only in text mode. This is not a problem, as I have several projects which live in the text world. (But at some point soon, Linux distros will drop support for the PowerPC architecture, and then I will be stuck.)

Could I replace all of this old hardware with shiny new hardware? Of course. Would the new hardware run more reliably? Probably (although the old hardware is fairly reliable.) But those are minor points. The main question is: Would the new hardware help me be more productive?

After careful consideration, I have to admit that, for me and my work, new hardware would *not* improve my productivity. It would not make me type faster, or write better software, or think more clearly.

So, for me, new hardware can wait. The old stuff is doing the job.

Wednesday, February 28, 2018

Backwards compatibility but not for Python

Programming languages change over time. (So do the natural, human-spoken languages. But let's stick to the programming languages.)

Most languages are designed and changes are carefully constructed to avoid breaking older programs. This is a tradition from the earliest days of programming. New versions of FORTRAN and COBOL were introduced with new features, yet the newer compilers accepted the older programs. (Probably because the customers of the expensive computers would be very mad to learn that an "upgrade" had broken their existing programs.)

Since then, almost every language has followed this tradition. BASIC, Pascal, dBase (and Clipper and XBase), Java, Perl, ... they all strove for (and still strive for) backwards compatibility.

The record is not perfect. A few exceptions do come to mind:
  • In the 1990s, multiple releases of Visual Basic broke compatibility with older versions as Microsoft decided to improve the syntax.
  • Early versions of Perl changed syntax. Those changes were Larry Wall deciding on improvements to the syntax.
  • The C language changed syntax for the addition-assignment and related operators (from =+ to +=) which resolved an ambiguity in the syntax.
  • C++ broke compatibility with a scoping change in "for" statements. That was its only such change, to my knowledge.
These exceptions are few. The vast history of programming languages shows compatibility from old to new versions.

But there is one language that is an exception.

That language is Python.

Python has seen a number of changes over time. I should say "Pythons", as there are two paths for Python development: Python 2 and Python 3. Each path has multiple versions (Python 2.4, 2.5, 2.6, and Python 3.4, 3.5, 3.6, etc.).

The Python 3 path was started as the "next generation" of Python interpreters, and it was started with the explicit statement that it would not be compatible with the Python 2 path.

Not only are the two paths different (and incompatible), versions within each path (or at least the Python 3 path) are sometimes incompatible. That is, some things in Python 3.6 are different in Python 3.7.

I should point out that the changes between versions (Python 3.6 and 3.7, or even Python 2 and 3) are small. Most of the language remains the same across versions. If you know Python 2, you will find Python 3 familiar. (The familiarity may cause frustration as you stumble across one of the compatibility-breaking changes, though.)

Should we care? What does it mean for Python? What does it mean for programming in general?

One could argue that changes to a programming language are necessary. The underlying technology changes, and programming languages must "keep up". Thus, changes will happen, either in many small changes or one big change. The latter often is a shift away from one programming language to another. (One could cite the transition from FORTRAN to BASIC as computing changed from batch to interactive, for example.)

But that argument doesn't hold up against other evidence. COBOL, for example, has been popular for transaction processing and remains so. C and C++ have been popular for operating systems, device drivers, and other low-level applications, and remain so. Their backwards-compatible growth has not appreciably diminished their roles in development.

Other languages have gained popularity and remain popular too. Java and C# have strong followings. They, too, have not been hurt by backwards-compatibility.

Python is an opportunity to observe the behavior of the market. We have been working on the assumption that backwards-compatibility is desired by the user base. This assumption may be a false one, and the Python approach may be a good start to observe the true desires of the market. If successful (and Python is successful, so far) then we may see other languages adopt the "break a few things" philosophy for changes.

Of course, there may be some demand for languages that keep compatibility across versions. It may be a subset of the market, something that isn't visible with only one language breaking compatibility, but only visible when more languages change their approach. If that is the case, we may see some languages advertising their backwards-compatibility as a feature.

Who knows? It may be that the market demand for backwards-compatibility may come from Python users. As Python gains popularity (and it is gaining popularity), more and more individuals and organizations build Python projects, they may find Python's approach unappealing.

Let's see what happens!

Thursday, February 22, 2018

Variables are... variable

The nice (and sometimes frustrating) thing about different programming languages is that they handle things, well, differently.

Consider the simple concept of a "variable". It is a thing in a program that holds a value. One might think that programming languages agree on something so simple -- yet they don't.

There are four actions associated with variables: declaration, initialization, assignment, and reference (as in 'use', not a constrained of pointer).

A declaration tells the compiler or interpreter that a variable exists often specifies a type. Some languages require a declaration before a variable can be assigned a value or used in a calculation; others do not.

Initialization provides a value during declaration. This is a special form of assignment.

Assignment assign a value, and is not part of declaration. It occurs after the declaration, and may occur multiple times. (Some languages do not allow for assignment after initialization.)

A reference of a variable is the use the value, to compute some other value or provide the value to a function or subroutine.

It turns out that different languages have different ideas about these operations. Most languages follow these definitions; the differences are in the presence or absence of these actions.

C, C++, and COBOL (to pick a few languages) all require declarations, allow for initialization, and allow for assignment and referencing.

In C and C++ we can write:

int i = 17;
i = 12;
printf("%d\n", i);

This code declares and initializes the variable i as an int with value 17, then assigns the value 12, then calls the printf() function to write the value to the console. COBOL has similar abilities, although the syntax is different.

Perl, Python, and Ruby (to pick different languages) do not have declarations and initialization but do allow for assignment and reference.

In Ruby we can write:

i = 12
puts i

Which assigns the value 12 to i and then writes it to the console. Notice that there is no declaration and no type specified for the variable.

Astute readers will point out that Python and Ruby don't have "variables", they have "names". A name is a reference to an underlying object, and multiple names can point to the same object. Java and C# use a similar mechanism for non-trivial objects. The difference is not important for this post.

BASIC (not Visual Basic or VB.NET, but old-school BASIC) is a bit different. Like Perl, Python, and Ruby it does not have declarations. Unlike those languages, it lets you write a statement that prints the value of an undeclared (and therefore uninitialized and unassigned) variable:

130 PRINT A

This is a concept that would cause a C compiler to emit errors and refuse to supply an executable. In the scripting languages, this would cause a run-time error. BASIC handles this with grace, providing a default value of 0 for numeric variables and "" for text (string) variables. (The AWK language also assigns a reasonable value to uninitialized variables.)

FORTRAN has an interesting mix of capabilities. It allows for declarations but does not require them. Variables have a specific type, either integer or real. When a variable is listed in a declaration, it has the specified type; when a variable is not declared it has a type based on the first letter of its name!

Like BASIC, variables in FORTRAN can be referenced without being initialized. Unlike BASIC, it does not provide default values. Instead it blissfully uses whatever values are in memory at the location assigned for the variable. (COBOL, C, and C++ have this behavior too.)

What's interesting is the trend over time. Let's look at a summary of languages and their capabilities, and the year in which they were created:

Languages which require declaration but don't force initialization

COBOL (1950s)
Pascal (1970s)
C (1970s)
C++ (1980s)
Java (1995)
C# (2000s)
Objective-C (1990s)

Languages which require declaration and require initialization (or initialize for you)

EIFFEL (1980s)
Go (2010)
Swift (2010)
Rust (2015)

Languages which don't allow declarations and require assignment before reference

Perl (1987)
Python (1989)
Ruby (1990s)

Languages which don't require (or don't allow) declaration and allow reference before assignment

FORTRAN (1950s)
BASIC (1960s)
AWK (1970s)
PowerShell (2000s)

This list of languages is hardly comprehensive, and it ignores the functional programming languages completely. Yet it shows something interesting: there is no trend for variables. That is, languages in the 1950s required declarations (COBOL) or didn't (FORTRAN), and later languages require declaration (Go) or don't (Ruby). Early languages allow for initialization, as do later languages. Early languages allow for use-without-assignment, as do later languages.

Perhaps a more comprehensive list may show trends over time. Perhaps splitting out the different versions of languages will show convergence of variables. Or perhaps not.

It is possible that we (that is, programmers and language designers) don't really know how we want variables to behave in our languages. With more than half a century of experience we're still developing languages with different capabilities.

Or maybe we have, in some way, decided. Its possible that we have decided that we need languages with different capabilities for variables (and therefore different languages). If that is the case, then we will never see a single language become dominant.

That, I think, is a good outcome.


Tuesday, February 6, 2018

The IRS made me a better programmer

We US taxpayers have opinions of the IRS, the government agency tasked with the collection of taxes. Those opinions tend to be strong and tend to fall on the "not favorable" side. Yet the IRS did me a great favor and helped me become a better programmer.

The assistance I received was not through employment at the IRS, nor did they send me a memo entitled "How to be a better programmer". They did give me some information, not related to programming, yet it turned out to be the most helpful advice on programming in my career.

That advice was the simple philosophy: One operation at a time.

The IRS uses this philosophy when designing the forms for tax returns. There are a lot of forms, and some cover rather complex notions and operations, and all must be understandable by the average taxpayer. I've looked at these forms (and used a number of them over the years) and while I may dislike our tax laws, I must admit that the forms are as easy and understandable as tax law permits. (Tax law can be complex with intricate concepts, and we can consider this complexity to be "essential" -- it will be present in any tax form no matter how well you design it.)

Back to programming. How does the philosophy of "one operation at a time" change the way I write programs?

A lot, as it turns out.

The philosophy of "one operation at a time" is directly applicable to programming. Well, my programming, at least. I had, over the years, developed a style of combining operations onto a single line.

Here is a simplified example of my code, using the "multiple operations" style:

Foo harry = y.elements().iterate().select('harry')

It is concise, putting several activities on a single line. This style makes for shorter programs, but not necessarily more understandable programs. Shorter programs are better when the shortness is measured in operations, not raw lines. Packing a bunch of operations -- especially unrelated operations -- onto a single line is not simplifying a program. If anything, it is making it more complex, as we tend to assume that operations on the same line are somehow connected.

I changed my style. I shifted from multi-operation lines to single operation lines, and I was immediately pleased with the result.

Here's the example from above, but with the philosophy of one operation per line:

elements = y.elements()
Foo harry = nil
elements.each do |element|
  harry = element if element.name == 'harry'

I have found two immediate benefits from this new style.

The first benefit is a better experience when debugging. When stepping through the code with the debugger, I can examine intermediate values. Debuggers are line-oriented, and execute the single-line version all in one go. (While there are ways to force the debugger to execute each function separately, there are no variables to hold the intermediate results.)

The second benefit is that it is easier to identify duplicate code. By splitting operations onto multiple lines, I find it easier to identify duplicate sequences. Sometimes the code is not an exact duplicate, but the structure is the same. Sometimes portions of the code is the same. I can refactor the duplicated code into functions, which simplifies the code (fewer lines) and consolidates common logic in a single place (one point of truth).

Looking back, I can see that my code is somewhat longer, in terms of lines. (Refactoring common logic reduces it somewhat, but not enough to offset the expansion of multiline operations.)

Yet the longer code is easier to read, easier to explain to others, and easier to fix. And since the programs I am writing are much smaller than the computer's capabilities, there is little expense at slightly longer programs. I suspect that compilers (for languages that use them) are optimizing a lot of my "one at a time" operations and condensing them, perhaps better than I can. The executables produced are about the same size as before. Interpreters, too, seem to have little problem with multiple simple statements, and run the "one operation" version of programs just as fast as the "multiple operations" version. (This is my perception; I have not conducted formal time trials of the two versions.)

Simpler code, easier to debug, and easier to explain to others. What's not to like?

Wednesday, January 31, 2018

Optimizing in the wrong direction

Back in the late 200X years, I toyed with the idea of a new version control system. It wasn't git, or even git-like. In fact, it was the opposite.

At the time, version control was centralized. There was a single instance of the repository and you (the developer) had a single "snapshot" of the files. Usually, your snapshot was the "tip", the most recent version of each file.

My system, like other version control systems of the time, was a centralized system, with versions for each file stored as 'diff' packages. That was the traditional approach for version control, as storing a 'diff' was smaller than storing the entire version of the file.

Git changed the approach for version control. Instead of a single central repository, git is a distributed version control system. It replicates the entire repository in every instance and uses a sophisticated protocol to synchronize changes across instances. When you clone a repo in git, you get the entire repository.

Git can do what it does because disk space is now plentiful and cheap. Earlier version control systems worked on the assumption that disk space was expensive and limited. (Which, when SCCS was created in the 1970s, was true.)

Git is also directory-oriented, not file-oriented. Git looks at the entire directory tree, which allows it to optimize operations that move files or duplicate files in different directories. File-oriented version control systems, looking only at the contents of a single file at a time, cannot make those optimizations. That difference, while important, is not relevant to this post.

I called my system "Amnesia". My "brilliant" idea was to, over time, remove diffs from the repository and thereby use even less disk space. Deletion was automatic, and I let the use specify a set of rules for deletion, so important versions could be saved indefinitely.

My improvement was based on the assumption of disk space being expensive. Looking back, I should have known better. Disk space was not expensive, and not only was it not expensive it was not getting expensive -- it was getting cheaper.

Anyone looking at this system today would be, at best, amused. Even I can only grin at my error.

I was optimizing, but for the wrong result. The "Amnesia" approach reduced disk space, at the cost of time (it takes longer to compute diffs than it does to store the entire file), information (the removal of versions also removes information about who made the change), and development cost (for the auto-delete functions).

The lesson? Improve, but think about your assumptions. When you optimize something, do it in the right direction.

Wednesday, January 24, 2018

Cloud computing is repeating history

A note to readers: This post is a bit of a rant, driven by emotion. My 'code stat' project, hosted on Microsoft Azure's web app PaaS platform, has failed and I have yet to find a resolution.

Something has changed in Azure, and I can no longer deploy a new version to the production servers. My code works; I can test it locally. Something in the deployment sequence fails. This is a test project, using the free level of Azure, which means no monthly costs but also means no support -- other than the community help pages.

There are a few glorious advances in IT, advances which stand out above the others. They include the PC revolution (which saw individuals purchasing and using computers), the GUI (which saw people untrained in computer science using computers), and the smartphone (which saw lots more people using computers for lots more sophisticated tasks).

The PC revolution was a big change. Prior to personal computers (whether they were IBM PCs, Apple IIs, or Commodore 64s), computers were large, expensive, and complicated; they were especially difficult to administer. Mainframes and even minicomputers were large and expensive; an individual could afford one if they were an enormously wealthy individual and had lots of time to read manuals and try different configurations to make the thing work.

The consumer PCs changed all of that. They were expensive, but within the range of the middle class. They required little or no administration effort. (The Commodore 64 was especially easy: plug it in, attach to a television, and turn it on.)

Apple made the consumer PC easier to use with the Macintosh. The graphical user interface (lifted from Xerox PARC's Alto, and later copied by Microsoft Windows) made many operations and concepts consistent. Configuration was buried, and sometimes options were reduced to "the way Apple wants you to do it".

It strikes me that cloud computing is in a "mainframe phase". It is large and complex, and while an individual can create a an account (even a free account), the complexity and time necessary to learn and use the platform is significant.

My issue with Microsoft Azure is precisely that. Something has changed and it behaves differently than it did in the past. (It's not my code, the change is in the deployment of my app.) I don't think that I have changed something in Azure's configuration -- although I could have.

The problem is that once you go beyond the 'three easy steps to deploy a web app', Azure is a vast and intimidating beast with lots of settings, each with new terminology. I could poke at various settings, but will that fix the problem or make things worse?

From my view, cloud computing is a large, complex system that requires lots of knowledge and expertise. In other words, it is much like a mainframe. (Except, of course, you don't need a large room dedicated to the equipment.)

The "starter plans" (often free) are not the equivalent of a PC. They are merely the same, enterprise-level plans with certain features turned off.

A PC is different from a mainframe reduced to tabletop size. Both have CPUs and memory and peripheral devices and operating systems, but are two different creatures. PCs have fewer options, fewer settings, fewer things you (the user) can get wrong.

Cloud computing is still at the "mainframe level" of options and settings. It's big and complicated, and it requires a lot of expertise to keep it running.

If we repeat history, we can expect companies to offer smaller, simpler versions of cloud computing. The advantage will be an easier learning curve and less required expertise; the disadvantage will be lower functionality. (Just as minicomputers were easier and less capable than mainframes and PCs were easier and less capable than minicomputers.)

I'll go out on a limb and predict that the companies who offer simpler cloud platforms will not be the current big providers (Amazon.com, Microsoft, Google). Mainframes were challenged by minicomputers from new vendors, not the existing leaders. PCs were initially constructed by hobbyists from kits. Soon after companies such as Radio Shack, Commodore, and the newcomer Apple offered fully-assembled, ready-to-run computers. IBM offered the PC after the success of these upstarts.

The driver for simpler cloud platforms will be cost -- direct and indirect, mostly indirect. The "cloud computing is a mainframe" analogy is not perfect, as the billed costs for cloud platforms can be inexpensive. The expense is not in the hardware, but the time to make the thing work. Current cloud platforms require expertise, and expertise that is not cheap. Companies are willing to pay for that expertise... for now.

I expect that we will see competition to the big cloud platforms, and the marketing will focus on ease of use and low Total Cost of Ownership (TCO). The newcomers will offer simpler clouds, sacrificing performance for reduced administration cost.

My project is currently stuck. Deployments fail, so I cannot update my app. Support is not really available, so I must rely on the limited web pages and perhaps trial and error. I may have to create a new app in Azure and copy my existing code to it. I'm not happy with the experience.

I'm also looking for a simpler cloud platform.

Thursday, January 18, 2018

After Agile

The Agile project method was developed as an alternative (one might say, a rebuttal) of Waterfall. Waterfall was first, aside from the proto-process of "do whatever we want" that was used prior to Waterfall. Waterfall had a revolutionary idea: Let's think about what we will do before we do it.

Waterfall can work with small and large projects, and small and large project teams. If offers fixed cost, fixed schedule, and fixed features. Once started, a project plan can be modified, but only with change control, a bureaucratic process to limit changes in addition to broadcasting proposed changes to the entire team.

Agile, its initial incarnation, was for small teams and projects with flexible schedules. Schedule may be fixed, or may be variable; you can deliver a working product at any time. (Although you cannot know in advance which features will be in the delivered product.)

Agile has no no change control process -- or rather, Agile is all about change control, allowing revisions to features at any time. Each iteration (or "sprint", or "cycle") starts with a conversation that involved stakeholders who decide on the next set of features. Waterfall's idea of "think, talk, and agree before we act" is part of Agile.

So we have two methods for managing development projects. But two is an unreasonable number. In the universe, there are rarely two (and only two) of things. Some things, such as electrons and stars and apples, exist in large quantities. Some things, such as the Hope Diamond and our planet's atmosphere, exist as singletons. (A few things do exist in pairs. But the vast majority of objects are either singles or multitudes.)

If software management methods exist as a multitude (for they are clearly not a singleton) then we can expect a third method after Waterfall and Agile. (And a fourth, and a fifth...)

What are the attributes of this new methods? I don't know -- yet. But I have some ideas.

We need a management process for distributed teams, where the participants cannot meet in the same room. This issue is mostly about communication, and it includes differences in time zones.

We need a management process for large systems composed of multiple applications, or "systems of systems". Agile cannot handle projects of this size; waterfall has flaws with it.

Here are some techniques that I think will be in new management methods:
  • Automated testing
  • Automated deployment with automated roll-back
  • Automated evaluation of source code (lint, Robocop, etc.)
  • Automated recording (and transcribing) of meetings and conversations
It is possible that new methods will use other terms and avoid the "Agile" term. I tend to doubt that. We humans like to name things, and we prefer familiar names. "Agile" was called "Agile" and not "Express Waterfall" because the founders wanted to emphasize the difference from the even-then reviled Waterfall method.

The Waterfall brand was tarnished -- and still is. Few folks want to admit to using Waterfall; they prefer to claim Agile methods. So I'm not expecting a "new Waterfall" method.

Agile's brand is strong; developers want to work on Agile projects and managers want to lead Agile projects. Whatever methods we devise, we will probably call them "Agile". We will use "Distributed Agile" for distributed teams, "Large Agile" for large teams, and maybe "Layered Agile" for systems of systems.

Or maybe we will use other terms. If Agile falls out of favor, then we will pick a different term, such as "Coordinated".

Regardless of the names, I'm looking forward to new project management methods.

Monday, January 1, 2018

Predictions for tech in 2018

Predictions are fun! Let's have some for the new year!

Programming Languages

Java, C, and C# will remain the most popular languages, especially in large commercial efforts. Moderately popular languages such as Python and JavaScript will remain moderately popular. (JavaScript is one of the "three legs of web pages", along with HTML and CSS, so it is very popular for web page and front-end work.)

Interest in functional programming languages (Haskell, Erlang) will remain minimal, while I expect interest in Rust (which focuses on safety, speed, and concurrency) to increase.

Cloud and Mobile

The year 2017 was the year that cloud computing become the default for new applications, especially business applications. The platforms and tools available from the big providers (Amazon.com, Microsoft, Google, and IBM) make a convincing case. Building traditional web applications on in-house data centers will still be used for some specialty applications.

The front end for applications remains split between browsers and mobile devices. Mobile devices are the platform of choice for consumer applications, including banking, sales, games, and e-mail. Browsers are the platform of choice for internal commercial applications, which require larger screens.

Browsers

Chrome will remain the dominant browser, possibly gaining market share. Microsoft will continue to support its Edge browser, and it has the resources to keep it going. Other browsers such as Firefox and Opera will be hard-pressed to maintain viability.

PaaS (Platform as a Service)

The middle version of platforms for cloud computing, PaaS sits between IaaS (Infrastructure as a Service) and SaaS (Software as a Service). It offers a platform to run applications, handling the underlying operating system, database, and messaging layers and keeping them hidden from the developer.

I expect an increase in interest in these platforms, driven by the increase in cloud-based apps. PaaS removes a lot of administrative work, for development and deployment.

AI and ML (Artificial Intelligence and Machine Learning)

Most of AI is actually ML, but the differences are technical and obscure. The term "AI" has achieved critical mass, and that's what we'll use, even when we're talking about Machine Learning.

Interest in AI will remain high, and companies with large data sets will take advantage of it. Initial applications will include credit analysis and fraud analysis (such applications are already under development). The platforms offered by Google, Microsoft, and IBM (and others) will make experimentation with AI possible for many, although one needs large data sets in addition to the AI compute platform.

Containers

Interest in containers will remain strong. Containers ease deployment; if you deploy frequently (or even infrequently) you will want to at least evaluate them.

Big Data

The term "Big Data" will all but disappear in 2018. Like its predecessor "real time", it was a vague description of computing that was beyond the reach of typical (at the time) hardware and software. Hardware and software improved to the point that performance was good enough, and the term "real time" is now limited to a few very specialized situations. I expect the same for "big data".

Related terms, like "data science" and "analytics" will remain. Their continued existence will depend on their perceived value to organizations; I think the latter has secured a place, the former is still under scrutiny.

IoT

The "Internet of Things" will see a lot of hype in 2018. I expect a lot of internet-connected devices, from drones to dolls, from cameras to cars, and from bicycles to birdcages (really!).

The technology for connected devices has gotten ahead of our understanding, much like the original microcomputers before the IBM PC.

We don't know how to use connected things -- yet. I expect that we will experiment with a lot of uses before we find the "killer app" of IoT. Once we do, I expect that we will see a standardization of protocols for IoT devices, making the early devices obsolete.

Apple

I expect Apple to have a successful and profitable 2018. They remain, in my opinion, at risk of becoming the "iPhone company", with more than 80% of the income coming from phones. The other risk is from their aversion to cloud computing -- Apple puts compute power in its devices (laptops, tablets, phones, and watches) and does not leverage or offer cloud services.

The latter omission (lack of cloud services) will be a serious problem in the future. The other providers (Microsoft, Google, IBM, etc.) provide cloud services and development platforms. Apple stands alone, keeping developers on the local device and using cloud computing for its internal use.


These are my predictions for 2018. In short, I expect a rather dull year, focused more on exploring our current technology than creating new tech. We've got a lot of relatively new tech toys to play with, and they should keep us occupied for a while.

Of course, I could be wrong!