Tuesday, March 27, 2018

A mainframe technique for improving web app performance

I've been working on a modern web application, making changes to improve its performance. For one improvement, I used an old technique from the days of mainframe processing.

I call it the "master file update" algorithm, although others may use different names.

It was commonly used when mainframe computers read data from tapes (or even punch cards). The program was to update a master file (say, account numbers and balances) with transactions (each with account numbers and transaction amount). The master file could contain bank accounts, or insurance policies, or something similar.

Why did mainframes use this technique? In short, they had to. The master file was stored on a magnetic tape, and transactions were on another tape (or perhaps punch cards). Both files were sequential-access files, and the algorithm read each file's records in sequence, writing a new master file along the way. Sequential access is the only way to read a file on magtape.

How did I use this technique? Well, I wasn't reading magnetic tapes, but I was working with two sets of data, one a "master" set of values and the other a set of "transactions". The original (slow) algorithm stored both sets of data in dictionary structures and required access by key values. Each access required a lookup into the dictionary. While the access was handled by the data structure's class, it still required work to find the correct value for each update.

The revised algorithm followed the mainframe technique: store the values in lists, not dictionaries, and make a single pass through the two sets of data. Starting with the first master value and the first transaction value, walk forward through the master values until the keys match. When the keys match, update the value and advance to the next transaction value. Repeat until you reach the end of both lists.

Dictionaries are good for "random access" to data. When you have to read (or update) a handful of values, and your updates are in no particular order, a dictionary structure is useful. There is a cost, as the dictionary as to find the item you want to read or update. Our situation was such that the cost of the dictionary was affecting our performance.

Lists are simpler structures than dictionaries. They don't have to search for the data; you move to the item in the list and read its value. The result is that read and write operations are faster. The change for a single operation was small; multiplied by the number of operations the change was a modest, yet noticeable, improvement.

Fortunately for me, the data was easily converted to lists, and it was in the right sequence. The revised code was faster, which was the goal. And the code was still readable, so there was no downside to the change.

The lesson for the day? Algorithms are usually not dependent on technology. The "master file update" algorithm is not a "mainframe technique", suitable only for mainframe applications. It works quite well in web apps. Don't assume that you must use only "web app" algorithms for web apps, and only "cloud app" algorithms for cloud apps. Such assumptions can blind you to good solutions. Keep you options open, and use the best algorithm for the job.

No comments: