Fitzpatrick's Fabulous Future: performance comparisons

Friday, September 17, 2021

An iPad is not a Chromebook

Apple announced some new products this week. With the announcements, Apple made some claims about performance.

Apple compared the iPad to the Chromebook. Specifically, Apple claimed that the performance of the iPad was superior to the Chromebook.

The comparison is a misdirection. It doesn't make sense to compare an iPod to a Chromebook. And with changes in the Windows world, it doesn't make sense to compare MacBooks to Windows laptops, either.

Here's why: Apple tablets and Chromebooks use two different models of computing. Apple designs its applications to run on the local device. (Apple raised this point in their presentation, to emphasize privacy.) Chromebooks are designed to run apps on the web. For iPhones and iPads, the capabilities of the local processor is important. Apple needs hefty processors in its phones and tablets. For Chromebooks, its more important to look at the cloud servers and the network connection. Google needs hefty processors, but in the servers. The processor in the Chromebook need only be powerful enough to send and receive data, and to render the screen image.

Google isn't alone in moving processing to web servers (or cloud servers). Microsoft is doing the same thing with its Office products and applications such as Teams. The computing model for Windows started as local processing, back in the 1990s. Today, some processing occurs on the local system and some processing occurs in the cloud.

More and more, comparing Apple laptops to Windows laptops and comparing Apple phones to Android phones is comparing apples to oranges. (If you can forgive the pun.)

The difference in computing models guides the design for hardware. Apple has to develop fast processors for its tablets and laptops -- all of the processing occurs there. Microsoft and Google don't have the same pressure, because they can shift heavy processing to cloud servers. Google shifts almost all processing to servers, and Microsoft is gradually redesigning its applications to take advantage of cloud servers. The result is that Microsoft and Google don't need superfast processors in laptops and tablets. (Some buyers of Windows PCs, especially gamers, may seek the fastest processor, but for the average tasks, the processor is unimportant.)

I'm a little confused by Apple's comparison of its new processors to Chromebooks. Apple made a big point about it, but it doesn't make sense. A better comparison would be Apple comparing the new phone and tablet to previous generations of Apple's iPhone and iPad.

Unless -- Perhaps the new processors in Apple's latest tablet and phone are not that much faster than the previous processors. Perhaps the new A15 processor is mildly faster that the A14. In that case, the comparison between A14 and A15 would be... unimpressive. It might be that comparing the new iPad to a Chomebook makes for better marketing. Apple can throw out impressive-sounding factoids such as "30% faster than the most popular Chromebook".

I'm sure that journalists and Apple enthusiasts will compare the new iPhones and iPads to their predecessors and report the results. We should see those results shortly after the new iPhones and iPads become available.

What if the third-party, apples-to-apples comparisons show that the new phones and tablets are only slightly faster than the previous generation? Should we abandon Apple equipment? Not at all. Apple equipment is still well-designed and well-engineered. But we should take a more skeptical view of the information that Apple provides in its hour-long "event" advertisements.

Looking ahead, we can expect a similar hour-long event (advertisement) for the next generation of Macbook laptops and Mac desktops. As with phones and tablets, comparing the performance of a Macbook to a Windows laptop is not always meaningful. (Especially if the Windows laptop uses a lot of server-based apps, or virtual workstations.) If Apple touts the performance of the new Macbook against a Windows laptop (or worse, a Chromebook) then I expect the performance improvement of the new Macs will be... unimpressive. On the other hand, if Apple compares apples to apples, and provides impressive performance comparisons of new Macbooks against older Macbooks, then we can be reasonably certain that the new Macs perform much better.

Let's see what happens at the next Apple advertising event.

Wednesday, November 11, 2020

Flipping bits

Apple has introduced a new line of Mac computers which use Apple's new M1 system-on-a-chip. Apple made several claims about the performance, but those claims were vague -- at least in terms of measured performance. (Claims were in the form of "up to 2 times faster", "up to five times faster", "the fastest Macbook yet".)

Apple's M1 chip contains a processor that is based on ARM architecture, which is quite different from Intel's x86 architecture. How does one compare them?

One of the oldest comparisons of processors is MIPS (millions of instructions per second). Dating back to the mainframe age, it is simple in concept: count the number of instructions in each second of processing. But when comparing different architectures, a comparison of instruction to instruction is not useful. The instructions sets are different, and a computation on one processor might use one instruction, and the same computation on a different processor might take three or four instructions.

A later metric, FLOPS (floating-point operations per second) is somewhat better when the two systems are performing tasks that use a lot of numeric processing. This metric focusses on the instructions used for processing numeric values, specifically floating-point values, and looks at operations and not necessarily discrete instructions. But its focus is its weakness: Tasks that use integer arithmetic and fixed-point data representations aren't usefully measured with FLOPS metrics.

Thinking about the differences between Intel and ARM processors, perhaps we need a different metric, one that is robust for different architectures and different types of processing. And I have thought of one, although it may not be practical.

My idea for measuring performance is to execute a task or function and count the number of bits that are changed (or "flipped") during the process.

The advantage of this metric is that it is neutral to architectures and also neutral to the type of processing. MIPS doesn't measure different instruction sets well and FLOPS omits integer operations. Bit-flipping, in contrast, works with all architectures and all data types. (I'll get to the downside of the bit-flipping metric later.)

The idea of bit flipping is simple: count each bit that changes state, either from 0 to 1 or from 1 to 0. If a register holds 0x0001 and is cleared, then one bit is changed (from 1 to 0). If a register holds 0x001f and is cleared, then five bits are changed. (If a register holds 0x0000 and it is cleared, then no bits are changed.)

Such a measure has the advantage of working with any type of data. We can use this measure with floating-point data and with integer data. We can even use it with BCD (binary converted decimal) data. We can use it with text data and with objects.

Notice that this measure does not measure time. A comparison of the number of bits changed during a process is a count, not a time. One processor might be flipping more bits but running faster. A higher flip count does not necessarily mean slower performance.

One might think that such a limitation makes the bit-flip metric useless for performance comparisons. If one is interested in speed, and only speed, then the point is valid. Bit-flipping measures the efficiency of the processor: the same task with a lower bit-flip count is more efficient -- and therefore uses less energy. That may be useful to system designers.

One challenge with the bit-flip metric is deciding which bits count. Even though bits are all the same (a single value of either 1 or 0) different bits are stored in different locations and used for different purposes.

One simple difference is memory and registers. Processors store values in memory ("RAM") and also store values in registers, a small set of locations that are part of the processor. Registers are used for computations and change often; memory is used for storage and changes less frequently. Should the bit-flips for memory count the same as bit-flips in registers?

If we agree to count registers and memory differently, then we open the door to other distinctions in bits. Modern processors have caches, usually three layers, which store values too. Caches exist between registers and memory. If registers and memory have different "weights" for bit-flipping, caches should probably have different weight too.

We've been looking at data, but processors also use addresses. In order to read or write a value in memory, the processor must put an address on the buss to tell memory which cell it wants. Those bits change; should we count those changes too?

Addresses have more complications. Modern processors use an advanced form of memory management, which allows for multiple processes. Memory management changes the address that the process requests (a "logical" address) to the real address that holds the value (the "physical" address). Should those bit-flips count? (Keep in mind that physical memory pages are assigned at random, so two separate runs could have different physical pages and therefore different memory addresses and therefore different bit-flip counts.)

After careful consideration, I think the best solution is to count all of the bits. ("All the bits!") Categorize them if you want, report them in groups if you want, assign different weights to the different groups if you want, but count all of them.

We could compare similar tasks on Windows and Mac, on Intel and ARM, even Microsoft Word and Office Libre.

Which brings me to the one drawback to bit-flip counts.

It's hard to do.

Counting bit-flips requires access to memory, registers, and all sorts of innards of the processor. It's not something an application can do, or even a device driver or kernel extension. This kind of observation must occur in the processor; nothing else will do.

To effectively use bit-flipping, we need the cooperation of Intel and ARM (and Apple, who designs the M1 off of the ARM design). We also need clear and precise descriptions of the data we want to collect, so that the counts on Intel processors count the same things as counts on ARM processors.

We also need an answer to the meta-question about data collection: Counting bit-flips in itself changes data values (the counters for bit-flip operations). Do we count those bit-flips too? (For the record, I am against counting those bit flips. Because if we count the bit-flip counter flips, then we have to count the counters for those bit flips, and the count the bit flips for those counters, and on and on. Enough is enough!)

Bit-flipping is a nice thought experiment. Implementation requires changes to processors and the cooperation of manufacturers, which is a tall order. I suspect it will remain a thought experiment.

Unless!

Unless... we limit bit-flips to something simple (such as only the CPU registers, and nothing else) and we leverage virtual modes, letting the "real" processor count the bit flips in the "virtual" processor. (Bit-flipping is a simple operation: XOR the "before" and "after" values, and then count the bits in the result.)

Such an arrangement might just be possible. It still needs cooperation from the processor designers, but such a scaled-down version might be possible.

Fitzpatrick's Fabulous Future

Friday, September 17, 2021

An iPad is not a Chromebook

Wednesday, November 11, 2020

Flipping bits

Blog Archive

About Me