Sunday, December 18, 2022

Moving fast and breaking things is not enough

Many have lauded the phrase "Move fast and break things". Uttered by Mark Zuckerberg, founder of Facebook, it became a rallying cry for developing at a fast pace. It is a rejection of the older philosophy of careful analysis, reviewed design, and comprehensive tests. And while the pace of "move fast and break things" has its appeal, it is clear that "move fast and break things", by itself, is not enough.

Moving fast and breaking things results in, obviously, broken things. Broken things can be useful (more on this later) but they are, well, broken. A broken web site does not help customers. A broken database does not produce end-of-month reports. A broken... you get the idea.

Clearly, the one thing that you must do after you break something is to fix it. The fix may be easy or may be difficult, depending on the nature of the failures that occurred. A developer, working in a private sandbox, can break things and then restore them to working order with a "revert" command to the version control system. (This assumes a version control system, which I think in 2022 is a reasonable assumption.)

Moving fast and breaking things in the production environment is most likely a larger problem. One cannot simply revert everything to last night's backup -- today's transactions must be maintained. So we can say that moving fast is safer in developer sandboxes and riskier in production. (Just about everything is riskier in production, I think.)

But breaking things and fixing them is not enough, either. There is little point in breaking something and then fixing in by putting things back as they were.

As I see it, the point of breaking things (and fixing them) is to learn. One can learn about the system: its strengths and weaknesses, how errors are propagated, the dependencies of different components, and the information contained in logs.

With new information, one can fix a system and provide a solution that is better than the previous design. One can identify future areas for improvements. One can understand the limitations of external services and third-party libraries. That knowledge can be used to improve the system, to make it more resilient against failures, to make it more flexible for future enhancements.

So yes, by all means move fast and break things. But also fix things, and learn about the system.

Monday, November 21, 2022

More Twitter

Elon Musk has made quite the controversy, with his latest actions at Twitter (namely, terminating employment of a large number of employees, terminating the contracts for a large number of contractors, and discontinuing many of Twitter's services). His decisions have been almost universally derided; it seems that the entire internet is against him.

Let's take a contrarian position. Let's assume -- for the moment -- that Musk knows what he is doing, and that he has good reasons for his actions. Why would he take those actions, and what is his goal?

The former is open to speculation. My thought is that Twitter is losing money (it is) and is unable to fill the gap between income and "outgo" with investments. Thus, Twitter must raise revenue or reduce spending, or some combination of both. While this fits with Musk's actions, it may or may not be his motivation. 

The question of Musk's goal may be easier to answer. His goal is to improve the performance of Twitter, making it profitable and either keeping the company or selling it. (We can rule out the goal of destroying the company.) Keeping Twitter gives Musk a large communication channel to lots of people (free advertising for Tesla?) and makes him a notable figure in the tech (software) community. If Musk can "turn Twitter around" (that is, make it profitable, whether he keeps it or sells it) he builds on his reputation as a capable business leader.

Reducing the staff at Twitter has two immediate effects. The first is obvious: reduced expenses. The second is less obvious: a smaller company with fewer teams, and therefore more responsive. Usually, a smaller organization can make decisions faster than a large one, and can act faster than a large one.

It is true that a lot of "institutional knowledge" can be lost with large decreases in staff. That knowledge can range from the design of Twitter's core software, its databases, and its processes for updates, and its operations (keeping the site running). Yet a lot of knowledge can be stored in software (and database structures), and read by others if the software is well-written.

I'm not ready to bury Twitter just yet. Musk may be able to make Twitter profitable and keep a commanding presence in the tech space.

But I'm also not ready to build on top of Twitter. Musk's effort may fail, and Twitter may fail. I'm taking a cautious approach, using it for distributing and collecting and non-critical information. 

Wednesday, November 2, 2022

Twitter

Elon Musk has bought Twitter and started making changes. Lots of people have commented on the changes. Here are my thoughts.

Musk's actions are radical and seem reckless. (At least, they seem reckless to me.) Dissolving the board, terminating employment of senior managers, demanding that employees work 84-hour weeks to quickly implement a new feature (a fee for the blue 'authenticated' checkmark), and threatening to terminate the employment of employees who don't meet performance metrics are no way to win friends -- although it may influence people.

Musk may think that running Twitter is similar to running his other companies. But Tesla, SpaceX, The Boring Company are quite different from Twitter.

Twitter has a number of components. It has software: the various clients that provide Twitter to devices and PCs, the database of tweets, the query routines that select the tweets to show to individuals, and advertising inventory (ads) and the functions that inject those ads into the viewed streams.

But notice that the database of tweets is not made by Twitter. It is made by Twitter's users. It is the user base that creates the tweets, not Twitter employees. (Nor are they mined from the ground or grown on trees.)

The risk that Twitter now faces is one of reputation. If the quality (or the perceived quality) of Twitter falls, people (users) will leave. And like all social media, the value of Twitter is mostly defined by how many other people are on the service. Facebook's predecessor MySpace knows this, as does MySpace's predecessor Friendster.

Social media is like a telephone. A telephone is useful when lots of people have them. If you were the only person on Earth with a phone, it would be useless to you. (Who could you call?) The more people who use Twitter, the more valuable it is.

Musk's actions are damaging Twitter's reputation. A number of people have already closed their accounts, and more a claiming to do so in the future. (Those future closures haven't occurred, and it is possible that those individuals will decide to stay on Twitter.)

As I see it, Twitter has technical problems (all companies do) but their larger issues are management and leadership issues. Musk may have made some unforced errors that will drive away users, advertisers, employees, and future investors.

Thursday, October 20, 2022

The Next Big Thing

What will we see as the next big thing?

Let's look at the history of computer technology -- or rather, a carefully curated version of the history of computer technology.

The history of computing can be divided into eras: The mainframe era, the minicomputer era, the micro/PC era, and so forth. And, with careful editing, we can see that these eras have similar durations: about 15 years each.

Let's start with mainframe computers. We can say that it ran from 1950 to 1965. Mainframe computers were (and still are) large, expensive computers capable of significant processing. They are housed in rooms with climate control and dedicated power. Significantly, mainframe computers are used by people only indirectly. In the mainframe age, programmers submitted punch cards which contained source code; the cards were fed into the computer by an operator (one who was allowed in the computer room); the computer compiled the code and ran the program; output was usually on paper and delivered to the programmer some time later. Mainframe computers also ran batch jobs to read and process data (usually financial transactions). Data was often read from magnetic tape and output could be to magnetic tape (updated data) or paper (reports).

Minicomputers were popular from 1965 to 1980. Minicomputers took advantage of newer technology; they were smaller, less expensive, and most importantly, allowed for multiple users on terminals (either paper-based or CRT-based). The user experience for minicomputers was very different from the experience on mainframes. Hardware, operating systems, and programming languages let users interact with the computer in "real time"; one could type a command and get a response.

Microcomputers and Personal Computers (with text displays, and without networking) dominated from 1980 to 1995. It was the age of the Apple II and the IBM PC, computers that were small enough (and inexpensive enough) for an individual to own. They inherited the interactive experience of minicomputers, but the user was the owner and could change the computer at will. (The user could add memory, add disk, upgrade the operating system.)

Personal Computers (with graphics and networking) made their mark from 1995 to 2010. They made the internet available to ordinary people. Graphics made computers easier to use.

Mobile/cloud computers became dominant in 2010. Mobile devices without networks were not enough (the Palm Pilot and the Windows pocket computers never gained much traction). Even networked devices such as the original iPhone and the Nokia N800 saw limited acceptance. It was the combination of networked mobile device and cloud services that became the dominant computing model.

That's my curated version of computing history. It omits a lot, and it fudges some of the dates. But it shows a trend, one that I think is useful to observe.

That trend is: computing models rise and fall, with their typical life being fifteen years.

How is this useful? Looking at the history, we can see that the mobile/cloud computing model has been dominant for slightly less than fifteen years. In other words, its time is just about up.

More interesting is that, according to this trend (and my curated history is too pretty to ignore), something new should come along and replace mobile/cloud as the dominant form of computing.

Let's say that I'm right -- that there is a change coming. What could it be?

It could be any of a number of things. Deep-fake tech allows for the construction of images, convincing images, of any subject. It could be virtual reality, or augmented reality. (The difference is nontrivial: virtual reality makes full images, augmented reality lays images over the scene around us.) It could be watch-based computing. 

My guess is that it will be augmented reality. But that's a guess.

Whatever the new thing is, it will be a different experience from the current mobile/cloud model. Each of the eras of computing had its own experience. Mainframes had an experience of separation and working through operators. Minicomputers had interactive experience, although someone else controlled the computer. Personal computers had interaction and the user owned the computer. Mobile/cloud let people hold computers in their hand and use them on the move.

Also, the next big thing does not eliminate the current big thing. Mobile/cloud did not eliminate web-based systems. Web-based systems did not eliminate desktop applications. Even text-mode interactive applications continue to this day. The next big thing expands the world of computing.

Wednesday, October 19, 2022

Businesses discover that cloud computing isn't magic

Businesses are now (just now, after more than a decade of cloud computing) discovering that cloud computing is not magic. That it doesn't make their computing cheap. That it doesn't solve their problems.

Some folks have already pointed this out. Looking back, it seems obvious: If all you have done is move your web-based system into cloud-based servers, why would things change? But they miss an important point.

Cloud computing is a form of computing, different from web-based applications and different from desktop applications. (And different from mainframe batch processing of transactions.)

A cloud-based system, to be efficient, must be designed for cloud computing. This means small independent services reading and writing to databases or other services, and everything coordinated through message queues. (If you know what those terms mean, then you understand cloud computing.)

Moving a web-based application into the cloud, unchanged, makes little sense. Or as much sense as moving a desktop-based application (remember those?) such as Word or Excel into the web, unchanged.

So why use cloud computing?

Cloud computing's strengths are redundancy, reliability, and variable power. Redundancy in that a properly designed cloud computing system consists of multiple services, each of which can be hosted on multiple (as in more than one per service) servers. If your system contains a service to perform address validations, that service could be running on one, two or seven different servers. Each instance does the same thing: examine a mailing address and determine the canonical form for that address.

The other components in your system, when they need to validate or normalize an address, issue a request to the validation service. They don't care which server handles the request.

Cloud systems are reliable because of this redundancy. A traditional web-based service would have one address validation server. If that server is unavailable, the service is unavailable for the entire system. Such a failure can lead to the entire system being unavailable.

Cloud systems have variable power. They can create additional instances of any of the services (including our example address validation service) to handle a heavy workload. Traditional web services, with only one server, can see slow response times when that server is overwhelmed with requests. (Sometimes a traditional web system would have more than one server for a service, but the number of servers is fixed and adding a server is a lengthy process. The result is the same: the allocated server or servers are overwhelmed and response time increases.)

Cloud services eliminate this problem by instantiating servers (and their services) as needed. When the address validation server is overwhelmed, the cloud management software detects it and "spins up" more instances. Good cloud management software works in the other direction too, shutting down idle instances.

Those are the advantages of cloud systems. But none of them are free; they all require that you build your system for the cloud. That takes effort.


Tuesday, October 11, 2022

Technical interviews

Businesses -- large businesses that have HR departments -- have a problem: They find it difficult to hire new staff.

The problem has a few aspects.

First is the processes that businesses have developed for hiring. Businesses have refined their processes over decades. They have automated the application process, they have refined the selection process to filter out the unqualified candidates, and they have documented job descriptions and made pay grades equitable. They have, in short, optimized the hiring process.

But they have optimized it for the pre-COVID market, in which jobs were few and applicants were plentiful. The selection processes have been designed to filter out candidates: to start with a large number of applications and through multiple steps, reduce that list to a manageable three (or five, or ten). The processes have been built on the assumption that many candidates wanted to work at the company, and were willing to undergo phone screens, interviews, and take-home tests.

The current market is a poor fit for these practices. Candidates are less willing to undergo day-long interviews. They demand payment for take-home tests (some of which can take hours). Candidates are especially reluctant to undergo the process multiple times, for multiple positions. The result is that companies cannot hire new staff. ("No one wants to work!" cry the companies, but a better description might be "Very few people are willing to jump through all of our hoops!")

One might think that companies could simply change their hiring processes. There is an obstacle to this: Human Resources.

Most people think that the purposes of Human Resources are to hire people, occasionally fire them, and administer wages and benefits. They miss an important purpose for HR: to keep the company out of court.

Human Resources is there to prevent lawsuits. Lawsuits from employees who claim harassment, candidates who were not hired, employees whose employment was terminated, employees who are unhappy with their annual performance review, ... you get the idea.

HR meets this objective by enforcing consistency. They administer consistent annual evaluations. They document employee performance prior to termination of employment. They define and execute consistent hiring practices.

Note that last item: consistent hiring practices. One of the ways that HR deflects lawsuits is by ensuring that hiring practices are consistent for all candidates (or all candidates in broad classes). Consistency is required not only across employees but also over time. A consistent approach (to hiring, performance review, or termination of employment) is a strong defense against claims of discrimination.

The suggestion that HR change its hiring practices goes against the "consistency" mandate. And HR has a good case for keeping its practices consistent.

Companies must balance the need for staff against the risk of lawsuits (from a change in practices). It is not an easy call, and one that should not be made lightly. And something to keep in mind: The job market may shift back to the previous state of "many candidates for few openings". Should a company adjust its practices for a shift in the market that may be temporary? Should it shift again when the market changes back?

I don't have simple, definite answers. Each company must find its own.

Wednesday, October 5, 2022

Success with C++

Having recently written on the possible decline of C++, it is perhaps only fair that I share a success story about C++. The C++ programming language is still alive, and still useful. I should know, because a recent project used C++, and successfully!

The project was to maintain and enhance an existing C++ program. The program was written by other programmers before I arrived, over a period of years. Most of the original developers were no longer on the project. (In other words, a legacy system.)

The program itself is small by today's standards, with less than 300,000 lines of source code. It also has an unusual design (by today's standards): The program calculates economic forecasts, using a set of input data. It has no interaction with the user; the calculations are made completely with nothing more than the input data and program logic.

We (the development team) have successfully maintained and enhanced this program by following some rules, and placing some constraints upon ourselves. The goal was to make the code easy to read, easy to debug, and easy to modify. We made some design decisions for performance, but only after our initial design was shown to be slow. These constraints, I think, were key to our success.

We use a subset of C++. The language is large and offers many capabilities; we pick those that are necessary. We use classes. We rarely use inheritance. Instead, we build classes from composition. Thus, we had no problems with slicing of objects. (Slicing is an effect that can occur in C++, when casting a derived class to a base class. It generally does not occur in other OOP languages.) There are a very small number of classes that use inheritance, and in those cases we often want slicing.

We use STL but not BOOST. The STL (the Standard Template Library) is enough for our needs, and we use only what we need: strings, vectors, maps, and an occasional algorithm.

We followed the Java convention for files, classes, class names, and function names. That is, each class is stored in its own file. (In C++, we have two files, for the header file and the source file.) The name of the file is the name of the class (with a ".h" or ".cpp" extension). The class name uses camel-case, with a capital letter at the beginning of each word, for names such as "Year" or "HitAdjustment". Function names use snake-case with all lower-case letters and underscores between words. This naming convention simplified a lot of our code. When creating objects, we could create an object of type Year and name it "year". (The older code using no naming conventions, and many classes had lower-case names, which meant that when creating an object of type "ymd" (for example) we had to pick a name like "my_ymd" and keep track mentally of what was a class name and what was a variable name.)

We do not use namespaces. That is, we do not "use std" or any other namespace. This forces us to specify the namespace for every class. While tedious, it provides the benefit that one can easily see the class for function names. There is no need to search through the code, or guess about a function.

We use operator overloading only for a few classes, and only when the operators are obvious. Most of our code uses function calls. This also reduces guesswork by developers.

We have no friend classes and no friend functions. (We could use them, but we don't need them.)

Our attitude towards memory management is casual. Current operating systems provide a 2 gigabyte space for our programs, and that is enough for our needs. (It has been so far.) We avoid pointers and dynamic allocation of memory. STL allocates memory for its objects, and we assume that it will manage that memory properly.

We do not use lambdas or closures. (We could use them, but we don't need them.)

We use spacing in our code to separate sections of code. We also use spacing to denote statements that are split across multiple lines. (A blank in front and a blank after.)

We use simple expressions. This increases the number of source lines, which eases debugging (we can see intermediate results). We let the C++ compiler optimize expressions for "release" builds.

----

By using a subset of C++, and carefully picking which features make up that subset, we have successfully developed, deployed, and maintained a modest-sized C++ application.

These constraints are not traditionally considered part of the C++ language. We enforce them for our code. It provides us with a consistent style of code, and one that we find readable. New team members find that they can read and understand the code, which was one of our goals. We can quickly make changes, test them, and deploy them -- another goal.

These choices work for us, but we don't claim that they will work for other teams. You may have an application that has a different design, a different user interface, or a different set of computations, and it may require a different set of C++ code.

I don't say that you should use these constraints on your project. But I do say this: you may want to consider some constraints for your code style. We found that these constraints let us move forward, slowly at first and then rapidly.