Sunday, July 9, 2017

Cloud and optimizations

We all recognize that cloud computing is different.

It may be that cloud computing breaks some of our algorithms.

A colleague of mine, a long time ago, shared a story about programming early IBM mainframes. They used assembly language, because code written in assembly executed faster than code written in COBOL. (And for business applications on IBM mainframes, at the time, those were the only two options.)

Not only did they write in assembly language, they wrote code to be fast. That is, they "optimized" the code. One of the optimizations was with the "multiply" instruction.

The multiply instruction does what you think: it multiplies to numbers and stores the result. To optimize it, they wrote the code to place the larger of the two values in one register and the smaller of the two values in the other register. The multiply instruction was implemented as a "repeated addition" operation, so the second register was really a count of the number of addition operations that would be performed. By storing the smaller number in the second register, programmers reduced the number of "add" operations and improved performance.

(Technically inclined folks may balk at the notion of reducing a multiply operation to repeated additions, and observe that it works for integer values but not floating-point values. The technique was valid on early IBM equipment, because the numeric values were either integers or fixed-point values, not floating-point values.)

It was an optimization that was useful at the time, when computers were relatively slow and relatively expensive. Today's faster, cheaper computers can perform multiplication quite quickly, and we don't need to optimize it.

Over time, changes in technology make certain optimizations obsolete.

Which brings us to cloud computing.

Cloud computing is a change in technology. It makes available a variable number of processors.

Certain problems have a large number of possible outcomes, with only certain outcomes considered good. The problems could describe the travels of a salesman, or the number of items in a sack, or playing a game of checkers. We have algorithms to solve specific configurations of these problems.

One algorithm is the brute-force, search-every-possibility method, which does just what you think. While it is guaranteed to find an optimal solution, there are sometimes so many solutions (millions upon millions, or billions, or quintillions) that this method is impractical.

Faced with an impractical algorithm, we invent others. Many are iterative algorithms which start with a set of conditions and then move closer and closer to a solution by making adjustments to the starting conditions. Other algorithms discard certain possibilities ("pruning") which are known to be no better than current solutions. Both techniques reduce the number of tested possibilities and therefore reduce the time to find a solution.

But observe: The improved algorithms assume a set of sequential operations. They are designed for a single computer (or a single person), and they are designed to minimize time.

With cloud computing, we no longer have a single processor. We have multiple processors, each operating in parallel. Algorithms designed to optimize for time on a single processor may not be suitable for cloud computing.

Instead of using one processor to iteratively find a solution, it may be possible to harness thousands (millions?) of cloud-based processors, each working on a distinct configuration. Instead of examining solutions in sequence, we can examine solutions in parallel. The result may be a faster solution to the problem, in terms of "wall time" -- the time we humans are waiting for the solution.

I recognize that this approach has its costs. Cloud computing is not free, in terms of money or in terms of computing time. Money aside, there is a cost in creating the multiple configurations, sending them to respecting cloud processors, and then comparing the many results. That time is a cost, and it must be included in our evaluation.

None of these ideas are new to the folks who have been working with parallel processing. There are studies, papers, and ideas, most of which have been ignored by mainstream (sequential) computing.

Cloud computing will lead, I believe, to the re-evaluation of many of our algorithms. We may find that many of them have a built-in bias for single-processor operation. The work done in parallel computing will be pertinent to cloud computing.

Cloud computing is a very different form of computing. We're still learning about it. The application of concepts from parallel processing is one aspect of it. I won't be surprised if there are more. There may be all sorts of surprises ahead of us.

No comments: