Thursday, June 11, 2020

The computer of Linus Torvalds

My experience as developer ranges from solo artist to member of large, enterprise projects. That experience has given me various insights about hardware, operating systems, programming languages, teamwork, and management.

One observation is about a combination of those aspects, specifically hardware and development teams: The minimum hardware requirements for a system are (most likely) the hardware that the developers are using. If you equip developers with top-of-the-line hardware, the system when delivered will require top-of-the-line hardware to run acceptably. As a corollary, if you equip developers with mid-line hardware, the delivered system will run acceptably on that level of hardware.

Developers may often complain about slow hardware, and point out that top-level hardware is not that expensive, and may actually reduce expenses once you factor in the time to pay developers to wait for slow compiles and tests. That is a valid point, but it loses sight of the larger point of a system that performs for a user with hardware that is less than top-of-the-line.

With fast hardware, developers do not see the performance problems. With slower hardware, developers are aware of performance issues, and build a better system. (Or at least one that runs faster.)

Which brings us to Linux Torvalds, the chief developer for the Linux kernel. More specifically, his computer.

A recent article on slashdot lists the specifications of his new computer. It sounds really nice. Fast. Powerful. And just what will lead Torvalds (and Linux) into the "performance trap". Such a computer will hide performance issues from Linus. That may send Linux into a direction that lets it run well on high-end hardware, and not so well on lower-end hardware or older systems.

With a high-end system to run and test on, Torvalds will miss the feedback when some changes have negative affects on performance on slower hardware. Those changes may work "just fine" on his computer, but not so well on other computers.

I recognize that the development effort of the Linux kernel has a lot of contributors, not all of whom have top-level hardware. Those developers may see performance issues. They may even raise them. But do they have a voice? Will their concerns be heard, and addressed? Or will Torvalds reject the issues as complaints and arrogantly tell those developers get "real computers and stop whining". His reputation suggests the latter.

If Torvalds does fall into the "performance trap" it may have significant effects on the future success of Linux. Linux may become "tuned" to high-performance hardware, running acceptably on expensive systems but slow and laggy on cheaper hardware. It may run well on new equipment but poorly on older systems.

That, in turn, may force users of older, slower hardware to re-think their decision to use Linux.


Thursday, May 28, 2020

After the quarantine, think

The 2020 quarantine, with its spate of "stay at home" orders and closure of offices, has enabled (forced?) may companies to implement "work from home" procedures that allow employees to, well, work from home. For some companies, this was a small change, as they already had procedures and infrastructure in place to allow employees to work from remote locations. For other companies, it was a big change.

As various parts of the country rescind the "stay at home" orders, companies are free to resume work as normal. It is my position that, instead of simply requiring employees to report to the office as before, companies (and their managers) think about what is best for the company.

Companies now have experience with remote work. In the past, one reason to stay with "work at the office" (the traditional arrangement of everyone working in a single office) was that managers could not be sure that "work from home" (or, more generally, "work from anywhere") would work for the company. The lack of experience made such a change risky. That excuse is no longer valid. Companies now have several weeks of experience with remote work.

But I am not suggesting that companies blindly adopt "work from home" for all employees. Nor I am suggesting that companies abandon remote work and require employees to work in the office.

Instead, I recommend that managers review the performance of the past few weeks, identify the strengths and weaknesses of remote work, and agree on a plan for the future. Some companies may be happy with remote work and decide to continue with it. Other companies may revert to "work at the office". A third group will choose a middle ground, with some employees remote and others in the office, or remote work for a portion of the week.

I am sure that managers are aware of the costs of maintaining an office building, and will view the path of remote work as a way to reduce those costs. Remote work also allows for expansion of the workforce without a corresponding expansion (and cost) of office space.

"Work in the office" on the other hand allows for all work to be done in a single location, which may make it easier to interact with people. Face-to-face communication is more effective than e-mail, voice phone, and video calls. A single office building also keeps the IT infrastructure in one place, with no need (or cost) for remote access and the accompanying security.

The 2020 pandemic and quarantine gave us information about remote work. It would be foolish for managers to ignore that information when deciding how to run their company.

Thursday, May 21, 2020

The lessons we programmers learn

We in the programming industry learn from our mistakes, and we create languages to correct our mistakes. Each "generation" of programming language takes the good aspects of previous languages, drops the bad aspects, and adds new improved aspects. (Although we should recognize that "good", "bad", and "improved" are subjective.) Over time, our "programming best practices" are baked into our programming languages.

We learned that assembly language was specific to hardware and forced us to think too much about memory layouts, so we invented high-level languages. COBOL and FORTRAN allowed us to write programs that were portable across computers from different vendors and let us specify variables easily. (I won't say "memory management" here, as early high-level languages did not allow for dynamic allocation of memory the way C and C++ do.)

COBOL, FORTRAN, and BASIC (another high-level language) used GOTO statements and rudimentary IF statements for flow control. We learned that those statements lead to tangled code (some say "spaghetti code"), so we invented structured programming with its "if-then-else" and "do-while" statements. Pascal was one of the first languages to implement structured programming. (It retained the GOTO statement, but it was rarely needed.)

Structured programming was better than non-structured programming, but it was not sufficient for large systems. We learned that large system need more than if-then-else and do-while to organize the code, so we invented object-oriented programming. The programming languages C++, Java, and C# became popular.

Designing new languages seems like a built-in feature of the human brain. And designing new languages that use the best parts of the old languages while replacing the mistakes of the old languages seems like a good thing.

But this arrangement bothers me.

Programmers who learn the whole trail of languages, from assembly to BASIC to C++ to Java, understand the weaknesses of the early languages and the strengths of later languages. But these programmers are few. Most programmers do not learn the whole sequence; they learn only the current languages which have pruned away all of the mistakes.

We programmers often look forward. We want the latest language, the newest database, the most recent operating system. In looking forward, we don't look back. We don't look at the older systems, and the capabilities that they had.

Those old systems (and languages) had interesting features. Their designers had to be creative to solve certain problems. Many of those solutions were discarded as hardware became more powerful and languages became more structured.

Is it possible that we have, in our enthusiasm to improve programming languages, discarded some ideas that are worthy? Have we thrown out a baby (or two) with the bathwater of poor language features?

If we don't look back, if we leave those abandoned features in the dust heap, how would we know?

Thursday, May 7, 2020

COBOL all the way down

Programming languages have changed over time. That's not a surprise. But what surprised me was one particular way in which languages have changed: the importance of libraries.

The first programming languages were designed to be complete. That is, a program or application built in that language would use only that language, and nothing else.

Programs built in COBOL (usually financial applications) use COBOL and nothing else. COBOL was built to handle everything. A COBOL program is COBOL, all the way down. (COBOL programs can use SQL, which was fitted into COBOL in the 1970s, but SQL is an exception.)

We saw a change in later languages. FORTRAN, BASIC, Pascal, and C provided functions in their run-time libraries. Most of the application was written in the top-level language, with calls to functions to perform low-level tasks such as trigonometric calculations or string operations.

The introduction of IBM's OS/2 and Microsoft's Windows also changed programming. The graphical operating systems provided a plethora of functions. There were functions for graphical output (to displays or printers), input devices (keyboards and mice), memory management, process management, and network functions. It was no longer sufficient to learn the language and its keywords; one had to learn the extra functions too.

Programming languages such as Java and C# provided more libraries and packages, and some of the libraries and packages handled nontrivial tasks. Libraries allowed for a collection of classes and functions, and packages allowed for a collection of libraries in a form that was easily deployed and updated. These additional packages required the programmer to know even more functions.

The trend has been not only an increase in the number of functions, but also the capabilities and sophistication of library functions and classes. Programming is, more and more, about selecting libraries, instantiating classes, and invoking functions, and less and less about writing the functions that perform the work.

We can see this trend continue in recent languages. Many applications in Python and R are use libraries for a majority of the work. In Python and R, the libraries do the work, and the code in Python and R act more like plumbing, connecting classes and functions.

To put this succinctly:

Early programming languages assume that the processing of the application will occur in those languages. Libraries provide low-level operations, such as input-output operations. A COBOL application is a COBOL program with assistance from some libraries.

Recent programming languages assume that the processing will occur in libraries and not user-written code. The expectation is that libraries will handle the heavy lifting. A Python application is one or more libraries with some Python code to coordinate activities.

This change has profound impacts for the future of programming, from system architecture to hiring decisions. It won't be enough to ask a candidate to write code for a linked list or a bubble sort; instead one will ask about library capabilities. System design will depend more on libraries and less on programming languages.

Wednesday, April 22, 2020

Three levels of Python programming

Python programming is not always what we think it is. I now think of Python programming as having three levels, three distinct forms of programming.

The first level is what we typically think of as programming in Python. It is writing Python code. This is the impression one gets when one has an "introduction to Python" class. The first program of "Hello, World" is written in Python, as are the successive programs in the class. Programs become more complex, with the addition of functions and later classes to organize larger and larger programs.

In this level, all of the code is Python. It is Python from top to bottom. And it works, for simple applications.

For some applications, it is not "Python all the way down". Some applications are complex. They must manage large quantities of data, and perform a significant number of calculations, and they must do it quickly. A Python-only solution is not a satisfactory solution, because Python is interpreted and slow.

At this point, programmers include carefully-constructed modules that perform calculations quickly. The modules "numpy" and "scipy" are the common modules, but there are many.

This is the second level of programming in Python. It is not often thought of as "programming in Python" or even "programming". It is more often though of as "importing modules and using the classes and functions in those modules".

That mindset makes sense. This work is less about Python and more about knowing which modules are available and which functions they provide. The task of programming is different; instead of writing all of the code, one assembles a solution from pre-packaged modules and uses Python to connect the various pieces.

That is why I think of it as a second level of programming. It is a different type of programming, a different type of thinking. It is not "how can I write code?" but instead "what existing code will perform this computation?".

Which brings us to the third level.

The third level of Python programming is building your own module. The existing Python modules, if they do what you need, are fast and effective. But if they do not do what you need, then they are not helpful.

Writing your own solution in Python will result is a slow program -- perhaps unacceptably slow. Therefore, as a last resort, one writes one's own module (in C or C++) and imports it into the main Python program.

This is, purists will argue, programming not in Python but in C or C++. They have a point -- it is writing C or C++ code.

But when the objective is to build a system to perform a specific task, and the top layer of the application is written in Python, then one can argue that the C code is merely an extension of the same application.

Or, one can think of the task as creating a system in multiple modules and multiple languages, not a single program in a single programming language, and using the best language for each piece of the system.

Python programming (or systems development) is often less about coding in a particular language and more about solving problems. With Python, we have three levels at which we can solve those problems.

Thursday, April 16, 2020

Lessons from the 2020 pandemic

In the middle of the 2020 pandemic, we can look around and see that many companies have shifted from "work in the office" to "work from home". (Many companies, especially retail, restaurants, movie theaters, and entertainment venues, have closed completely, with no ability to work from home.)

For those companies that have made the change, we can look and wonder why they did not make this change earlier. While some companies offered limited "work from home" opportunities (and many companies offered nothing), the pandemic has forced companies to change. Why the sudden change?

Some observations:

Shifting from "work in the office" to "work from home" is possible when the infrastructure is present. The automation of work, starting with PC-based word processors (in the 1980s) and continuing with networks (in the 1990s) and then connected networks and high-speed internet in the home (in the 2000s) all allow remote work to occur. But even with the infrastructure in place, office culture held that face-to-face interactions and work in the office was better than work from home.

Allowing your entire employee base (or a large percentage of it) is easy when a government order closes your office and forbids employees from working in it. Some work, even the small amount that gets done when working from home, is better than none.

Allowing your workforce to work from home is also easy when all other companies -- especially your competition -- are allowing their employees to work from home. Being "part of the crowd" reduces the risk (or the perceived risk) of such a change. With all companies making the change, the risk reverses: the oddball is not the company that shifts to "work from home" but the company that remains in the office.

Changing from "work in the office" to "work from home" is also easy when all of your employees make the change, instead of a few chosen workers. The typical approach to change (small pilot programs with a few employees) sets up the dynamics of "chosen" and "not chosen" employees, which can create resentment among the "not chosen" employees. When all employees shift to "work from home" it is clear that there is no favoritism and that "work from home" is not a reward for good behavior.


The change from "work in the office" to "work from home" did happen, for many companies, quickly and easily. Much of that ease of change was from the risks, or rather the change in the risk profile. The technology was in place, other companies were making the same change, all employees (or as many as practical) were involved, and the government was issuing orders that made "work in the office" impossible.

Looking forward, will companies shift back to "work in the office"? I suspect that the office culture of face-to-face interactions still holds, so that will pull managers towards a "work in the office" arrangement. In the other direction, no company wants to be first, especially when the risk of COVID-19 is still present. The decision to shift from "work from home" to "work in the office" will not be an easy one.

Tuesday, April 7, 2020

Tech debt considered possibly not harmful

Current wisdom holds that tech debt (poorly-implemented programs or sections of programs) is bad, and should be avoided. Much as been said about "clean code" and keeping the code in a good state of repair at all times. The Agile Development methodology recognizes the need for refactoring, to reduce tech debt.

Everyone agrees that good code is good (for the project and for the company) and bad code is bad (also, for the project and the company).

Except possibly me.

Which is to say, I am not convinced that every project should take steps to avoid or reduce tech debt. I am of the opinion that some projects should avoid or reduce tech debt, and other projects should not.

Which projects should avoid tech debt -- and which projects should allow it -- is an interesting question, and not always easy to answer. But the answers lie within another question: why is tech debt bad?

Tech debt is bad, we all agree (including myself), in that it increases the development cost. The forms that tech debt takes -- poorly written programs, older programming languages or programs that depend on old versions of interpreters or compilers -- slows the development of new features and fixes to existing features. Tech debt also makes for a brittle code base, such that a small change in one section can have large effects throughout the entire system. Thus, even the smallest of changes must be carefully analyzed, carefully designed, carefully implemented, carefully tested, carefully reviewed, and carefully tested again. Each of those "carefully" operations requires time and effort.

But preventing or fixing tech debt also has a cost. It diverts development resources from adding new features into fixing old code. That diversion can delay the implementation of new features (if you keep the size of the development team constant) or increase the cost of the development team (if you add members).

The decision to reduce tech debt depends on one thing, and one thing only: the value that the organization places on the software. And the value of software, while it can be calculated with the rules of accounting for capital expenditures and depreciation, is really dependent on how you use the software.

Any code base can have value from the following uses:
  • Using the application (that is, running it) for company business
  • Taking pieces of the code for use in other applications
  • Copying design of the code for use in other applications (in a different language)
  • Selling the code to another organization
If you are actively using the software (and most likely maintaining it with small fixes and possibly large enhancements), then the effects of tech debt will drive up the development costs, and tech debt should be evaluated and, when reasonable, reduced.

If you are not using the software, but intend to use pieces of the software in other systems, then the cost of tech debt must be discounted. Only the pieces that will be transferred should be considered. The remaining pieces, which will be discarded, have no intrinsic value and you should not fix their tech debt.

A different calculation applies for the transfer of not code but design. Transferring a poor design from one system to another is maintaining a poor design. But it may be more effective to fix the tech debt on the receiving end, rather than in the source system.

If you are selling the code, and this is a one-time event, then I see little incentive to improve the code. Odds are that the purchaser will not evaluate the quality of the code, or provide a higher purchase price for the improved code.

If you sell code often -- and perhaps are in the business of selling code -- then your code is your product, and you should look to remove tech debt from your code. Your code is your offering to your customers, and your reputation is built on it.

The one scenario that I did not list is the decommissioning of software. If your software has a short life (and you must define "short" as it varies from organization to organization) with no use after that life, then any investment will have a limited time for a return. We don't fix cars that are about the be hauled off to the junkyard, and we shouldn't fix software that we are about to discard.

The decision to avoid or reduce tech debt depends on the future use of the software. For some systems, this is easy: long-lived code such as the Linux kernel, or Microsoft Word, or an accounting system, all benefit from reduced tech debt. Other, short-lived code (such as a short script that is discarded at the end of the day) gain little from refactoring and improvements.

The difficult part in this is determining the future of the software. But once you know that, you know how much effort you should put into the removal of tech debt.