Wednesday, September 25, 2024

Back to the Office

Amazon.com is the latest in a series of companies to insist that employees Return-To-Office (RTO).

Some claim that Amazon's motives (and by extension, any company that requires employees to work in the office) is really a means of reducing their workforce. The idea is that employees would rather leave the company than work in the office, and enforcing office-based work is a convenient way to get employees to leave. (Here, "convenient" means "without layoffs".)

I suspect that Amazon (and other companies) are not using RTO as a means to reduce their workforce. It may avoid severance payments and the publicity of layoffs, but it holds other risks. One risk is that the "wrong" number of employees may terminate their employment, either too many or too few. Another risk is that the "wrong" employees may leave; high performers may pursue other opportunities and poor performers may stay. It also selects employees based on compliance (those who stay are the ones who will follow orders) while the independent and confident individuals leave. That last effect is subtle, but I suspect that Amazon's management is savvy enough to understand it.

But while employers are smart enough to not use RTO as a workforce reduction technique, they are still insisting upon it. I'm not sure that they are fully thinking though the reasons they use to justify RTO. Companies have pretty much uniformly claimed that an office-based workforce is more productive, despite studies which show the opposite. Even without studies, employees can often get a feel for productivity, and they can tell that RTO does not improve it. Therefore, by claiming RTO increases productivity, management loses credibility.

That loss of credibility may be minimal now, but it will hang over management for some time. And in some time, there may be another crisis, similar to the COVID-19 pandemic, that forces companies to close offices. (That crisis may be another wave of COVID, or it may be a different virus such as Avian flu or M-pox, or it may be some other form of crisis. It may be worldwide, nationwide, or regional. But I fear that it is coming.)

Should another crisis occur, one that forces companies to close offices and ask employees to work from home, how will employees react? My guess is that some employees will reduce their productivity. The thinking is: If working in the office improves productivity (and our managers insist that it does), then working at home must reduce productivity (and therefore I will deliver what the company insists must happen).

Corporate managers may get their wish (high productivity by working in the office) although not the way that they want. By explaining the need for RTO in terms of productivity, they have set themselves up for a future loss of productivity when they need employees to work from home (or other locations).

Tuesday, September 17, 2024

Apple hardware is outpacing Apple software

Something interesting about the new iPhone 16: the software isn't ready. Specifically, the AI ("Apple Intelligence") enhancements promised by Apple are still only that: promises.

Apple can develop hardware faster than it can develop software. That's a problem.

Apple has had this problem for a while. The M1 Mac computers first showed this problem. Apple delivered the computers, with their integrated system-on-chip design and more efficient processing, but delivered no software to take advantage of that processing.

It may be that Apple cares little for software. They sell computers -- hardware -- and not software. And it appears that Apple has "outsourced" the development of applications: it relies on third parties to build applications for Macs and iPhones. Oh, Apple delivers some core applications, such as utilities to configure the device, the App Store to install apps, and low-powered applications such as Pages and Numbers. But there is little beyond that.

Apple's software development focusses on the operating system and features for the device: MacOS and Pages for the Mac, iPadOS and Stage Manager for the iPad, iOS and Facetime and Maps for the iPhone. Apple builds no database systems, has lukewarm support and enhancements for the Xcode IDE, and few apps for the iPhone.

I suspect that Apple's ability to develop software has atrophied. Apple has concentrated its efforts on hardware (and done rather well) but has lost its way with software.

That explains the delay for Apple Intelligence on the iPhone. Apple spent a lot of time and effort on the project, and (I suspect) most of that was for the hardware. Updates to iOS for the new iPhone were (probably) fairly easy and routine. But the new stuff, the thing that needed a lot of work, was Apple Intelligence.

And it's late.

Thinking about the history of Apple's software, I cannot remember a similar big feature added by Apple. There is Facetime, which seems impressive but I think the iPhone app is rather simple and a lot of the work is in the back end and scalability of that back end. Stage Manager was (is) also rather simple. Even features of the Apple Watch such as fall detection and SOS calls are not that complex. Operating systems were not that difficult: The original iOS was new, but iPadOS is a fork of that and WatchOS is a fork of it too (I think).

Apple Intelligence is a large effort, a greenfield effort (no existing code), and one that is very different from past efforts. Perhaps it is not surprising that Apple is having difficulties.

I expect that Apple Intelligence will be delivered later than expected, and will have more bugs and problems than most Apple software.

I also expect to see more defects and exploits in Apple's operating systems. Operating systems are not particularly complex (they are as complex as one makes them) but development and maintenance requires discipline. One gets that discipline through constant development and constant monitoring of that development. It requires an appreciation of the importance of the software, and I'm not sure that Apple has that mindset.

If I'm right, we will see more and more problems with Apple software. (Slowly at first, and then all at once.) Recovery will require a change in Apple's management philosophy and probably the senior management team.

Sunday, September 8, 2024

Agile, Waterfall, and Risk

For some years (decades, really), software development has used an agile approach to project management. The Agile method sees short iterations that each focus on a single feature, with the entire team reviewing progress and selecting the feature for the next iteration. Over time, a complete system evolves. The advantage is that the entire team (programmers, managers, salespersons, etc.) learn about the business problem, the functions of the system, and the capabilities of the team. The team can change course (hence the name "agile") as they develop each feature.

Prior to Agile, for some years (decades, really), software development used the "waterfall" approach to project management. The Waterfall method starts with a set of requirements and a schedule, and moves through different phases for analysis, design, coding, testing, and deployment. The important aspect is the schedule. The Waterfall method promises to deliver a complete system on the specified date.

This last aspect of Waterfall is quite different from Agile. The Agile method makes no promise to deliver a completed system on a specific date. It does promise that each iteration ends with a working system that implements the features selected by the team. Thus, a system developed with Agile is always working -- although incomplete -- whereas a system developed with Waterfall is not guaranteed to work until the delivery date.

(It has been observed that while the Waterfall method promises a complete, working system on the specified delivery date, it is quite poor at keeping that promise. Many projects overrun both schedule and budget.)

Here is where risk comes into play.

With Agile, the risk is shared by the entire team, key among these are developers and managers. An agile project has no specified delivery date, but more often than not senior managers (those above the agile-involved managers) have a date in mind. (And probably a budget, too.) Agile projects can easily overrun these unstated expectations. When they do, the agile-involved managers are part of the group held responsible for the failure. Managers have some risk.

But look at the Waterfall project. When a waterfall project fails (that is, runs over schedule or budget) the managers have way to distance themselves from the failure. They can say (honestly) that they provided the developers with a list of requirements and a schedule (and a budget) and that the developers failed to meet meet the "contract" of the waterfall project. Managers can deflect the risk to the development team.

(For some reason, we rarely question the feasibility of the schedule, or the consistency and completeness of the requirements, or the budget assigned to the project. These are considered "good", and any delay or shortcoming is therefore the fault of the developers.)

Managers want to avoid risk -- or at least transfer it to another group. Therefore, I predict that in the commercial space, projects will slowly revert from Agile methods to Waterfall methods.

Thursday, August 1, 2024

Google search is broken, and we all suffer

Google has a problem. That problem is web search.

Google, the long-time leader in web search, recently modified its techniques to use artificial intelligence (AI). The attempt at AI-driven search has lead to embarrassing results. One person asked how to keep cheese on pizza, and Google suggested using glue. Another asked about cooking spaghetti, and Google recommended gasoline.

The problem was that Google pointed its AI search engine at the entire web, absorbing posts from various sources. Some of those posts contained text that was a joke or sarcastic. A human would be able to tell that the entries were not to be used in search results, but Google's algorithm isn't human.

Google has rediscovered the principle of "garbage into a computer system yields garbage output".

One might think that Google could simply "pull the plug" on the AI search and revert back to the older mechanisms it used in the past. But here too Google has a problem: the old search algorithms don't work (anymore).

Google started with a simple algorithm for search: count links pointing to the page. This was a major leap forward in search; previous attempts were curated by hand. Over the years, web designers have "gamed" the Google web crawler to move their web pages up in the results, and Google has countered with changes to their algorithm. The battle continues; there are companies that help with "Search Engine Optimization" or "SEO". Those optimizing companies have gotten quite good at tweaking web sites to appear high in searche results. But the battle is lost. Despite Google's size (and clever employees) the SEO companies have won, and the old-style Google search no longer shows meaningful results but mostly advertisement links.

SEO has changed Google search from a generic search engine into a sales lead tool. If you want to purchase something, Google is a great way to find a good price. But if you want something else, Google is much less useful that it used to be. It is no longer a tool for answers to general questions.

That means that search, for the internet, is broken.

It's not completely broken. In fact,"broken" is too strong of a word for the concept. Better choices might be "damaged" or "compromised", or even "inconsistent". Some searches work, and others don't.

Broken, damaged, or inconsistent, Google's search engine has suffered. Its reputation is reduced, and fewer people use it. That's a problem for Google, because the search results is a location to display advertisements, and advertisements are Google's major source of income.

A broken Google search is a problem for us all, in two ways.

First, with Google search broken, we (all) must now find alternative means of answering questions. AI might help for some -- although I don't recommend it for recipes -- and that can be a partial replacement. Other search engines (Bing, Yahoo) may work for now, but I expect that they will succumb to the same SEO forces that broke Google. With no single reliable source of information, we must now turn to multiple sources (stackexchange, Red Hat web pages, and maybe the local library) which means more work for us.

Secondly, the defeat of the Google whale to the SEO piranhas is another example of "this is why we cannot have nice things". It is the tragedy of the commons, with individuals acting selfishly and destroying a useful resource. Future generations will look back, possibly in envy, at the golden age of Google and a single source of reliable information.

Monday, July 22, 2024

CrowdStrike, Windows blue screens, and the future

A small problem with CrowdStrike, a Windows security application, has caused a wide-spread problem with thousands, perhaps millions, of PCs running Windows.

Quite a few folks have provided details about the problem, and how it happened.

Instead, I have some ideas about what will happen: what will happen at Microsoft, and what will happen at all of the companies that use CrowdStrike.

Microsoft long ago divided Windows into two spaces: one space for user programs and another space for system processes. The system space includes device drivers.

Applications in the user space can do some things, but not everything. They cannot, for example, interact directly with devices, nor can they access memory outside of their assigned range of addresses. If they do attempt to perform a restricted function, Windows stops the program -- before it causes harm to Windows or another application.

User-space applications cannot cause a blue screen of death.

If an error in CrowdStrike caused a blue screen of death (BSOD), then CrowdStrike must run in the system space. This makes sense, as CrowdStrike must access a lot of things to identify attacks, things normal applications do not look at. CrowdStrike runs with elevated privileges.

I'm guessing that Microsoft, as we speak, is thinking up ways to restrict third-party applications that must run with elevated privileges such as CrowdStrike. Microsoft won't force CrowdStrike into the user space, but Microsoft also cannot allow CrowdStrike to live in the system space where it can damage Windows. We'll probably see an intermediate space, one with more privileges than user-space programs but not all the privileges of system-space applications. Or perhaps application spaces with tailored privileges, each specific to the target application.

The more interesting future is for companies that use Microsoft Windows and applications such as CrowdStrike.

These companies are -- I imagine -- rather disappointed with CrowdStrike. So disappointed that they may choose to sue. I expect that management at several companies are already talking with legal counsel.

A dispute with CrowdStrike will be handled as a contract dispute. But I'm guessing the CrowdStrike, like most tech companies, specified arbitration in their contracts, and limited damages to the cost of the software.

Regardless of contract terms, if CrowdStrike loses, they could be in severe financial hardship. But if they prevail, they could also face a difficult future. Some number of clients will move to other providers, which will reduce CrowdStrike's income.

Other companies will start looking seriously at the contracts from suppliers, and start making adjustments. They will want the ability to sue in court, and they will want damages if the software fails. When the maintenance period renews, clients will want a different set of terms, one that imposes risk upon CrowdStrike.

CrowdStrike will have a difficult decision: accept the new terms or face further loss of business.

This won't stop at CrowdStrike. Client companies will review terms of contracts with all of their suppliers. The "CrowdStrike event" will ripple across the industry. Even companies like Adobe will see pushback to their current contract terms.

Supplier companies that agree to changes in contract terms will have to improve their testing and deployment procedures. Expect to see a wave of interest in process management, testing, verification, static code analysis, and code execution coverage. And, of course, consulting companies and tools to help in those efforts.

Client companies may also review the licenses for open source operating systems and applications. They may also attempt to push risk onto the open source projects. This will probably fail; open source projects make their software available at no cost, so users have little leverage. A company can choose to replace Python with C#, for example, but the threat of "we will stop using your software and pay you nothing instead of using your software and paying you nothing" has little weight.

Therefore shift in contracts will occur in the commercial space, at least not at first. It may change in the future, as changes in the commercial space become the norm.

Thursday, June 6, 2024

What to do with an NPU

Microsoft announced "Copilot PC", a new standard for hardware. It includes a powerful Neural Processing Unit (NPU) along with the traditional (yet also powerful) CPU and GPU. The purpose of this NPU is to support Microsoft's Copilot+, an application that uses "multiple state-of-the-art AI models ... to unlock a new set of experiences that you can run locally". It's clear that Microsoft will add generative AI to Windows and Windows applications. (It's not so clear that customers want generative AI or "a new set of experiences" on their PCs, but that is a different question.)

Let's put Windows to the side. What about Linux?

Linux is, if I may use the term, a parasite. It runs on hardware designed for other operating systems (such as Windows, macOS, or even Z/OS). I fully expect that it will run on these new "Copilot+ PCs", and when running, it will have access to the NPU. The question is: will Linux use that NPU for anything?

I suppose that before we attempt an answer, we should review the purpose of an NPU. A Neural Processing Unit is designed to perform calculations for a neural network. A neural network is a collection of nodes with connections between nodes. It has nothing to do with LANs or WANs or telecommunication networks.

The calculations of a neural network can be performed on a traditional CPU, but they are a poor match for the typical CPU. The calculations are a better match for a GPU, which is why so many people ran neural networks on them -- GPUs performed better than CPUs.

NPUs are better at the calculations than GPUs (and much better than CPUs), so if we have a neural network, its calculations would run fastest on an NPU. Neural Processing Units perform a specialized set of computations.

One application that uses those computations is the AI that we hear about today. And it may be that Linux, when detecting an NPU, will route computations to it, and those computations will be for artificial intelligence.

But Linux doesn't have to use an NPU for generative AI, or other commercial applications of AI. A Neural Network is, at its essence, a pattern-matching mechanism, and while AI as we know it today is a pattern-matching application (and therefore well-served by NPUs), it is not the only patter-matching application. It is quite possible (and I think probable) that the open-source community will develop non-AI applications that take advantage of the NPU.

I suspect that this development will happen in Linux and in the open source community, and not in Windows or the commercial market. Those markets will focus on the AI that is being developed today. The open source community will drive the innovation of neural network applications.

We are early in the era of neural networks. So early that I think we have no good understanding of what they can do, what they cannot do, and which of those capabilities match our personal or business needs. We have yet to develop the "killer app" of AI, the equivalent of the spreadsheet. "VisiCalc" made it obvious that computers were useful; once we had seen it, we could justify the purchase of a PC. We have yet to find the "killer app" for AI.


Thursday, May 16, 2024

Apple's Pentium moment

In the 1990s, as the market shifted from the 80486 processor to the newer Pentium processor, Intel had a problem. On some Pentium processors, a certain mathematical operation was incorrect. It was called the "FDIV bug". What made this a problem was that the error was detected only after a significant number of Pentium processors had been sold inside PCs.

Now that Apple is designing its own processors (not just the M-series for Mac computers but also the A-series for phones and tablets), Apple faces the risk of a similar problem.

It's possible that Apple will have a rather embarrassing problem with one of its processors. The question is: how will Apple handle it?

In my not-so-happy prediction, the problem will be more than an exploit that allows data to be extracted from the protected vault in the processor, or memory to be read across processes. It will be more severe. It will be a problem with the instruction set, much like Intel's FDIV problem.

If we assume that the situation will be roughly the same as the Intel problem, then we will see:

- A new processor (or a set of new processors) from Apple
- These processors will have been released; they will be in at least one product and perhaps more
- The problem will be rare, but repeatable. If one creates a specific sequence, one can see the problem

Apple may be able to correct it with an update. If it is, then Apple's course is easy: an apology and an update. Apple may take some minor damage to its reputation, which will fade over time.

Or maybe the problem cannot be fixed with an update. The error might be "hard-coded" into the chip. Apple now has a few options, all of them bad but some less bad than others.

It can fix the problem, build a new set of processors, and then assemble new products and offer free replacements. Replacing the defective units is expensive for Apple, in the short term. It probably creates the most customer loyalty, which can improve revenue and profits in the longer term.

Apple could build a new set of products and instead of offering free replacements, offer high trade-in values for the older units. Less expensive in the short term, but less loyalty moving forward.

I'm not saying that this will happen. I'm saying that it may happen. I have no connection with Apple (other than as a customer) and no insight into their design process and quality assurance procedures.

Intel, when faced with the FDIV bug, handled it poorly. Yet Intel survives today, so its response was not fatal. Let's see what Apple does.