Fitzpatrick's Fabulous Future: program design

Showing posts with label program design. Show all posts

Thursday, May 2, 2013

Our fickleness on the important aspects of programs

Over time, we have changed our desire in program attributes. If we divide the IT age into four eras, we can see this change. Let's consider the four eras to be mainframe, PC, web, and mobile/cloud. These four eras used different technology and different languages, and praised different accomplishments.

In the mainframe era, we focussed on raw efficiency. We measured CPU usage, memory usage, and disk usage. We strove to have enough CPU, memory, and disk, with some to spare but not too much. Hardware was expensive, and too much spare capacity meant that you were paying for more than you needed.

In the PC era we focussed not on efficiency but on user-friendliness. We built applications with help screens and menus. We didn't care too much about efficiency -- many people left PCs powered on overnight, with no "jobs" running.

With web applications, we focussed on globalization, with efficiency as a sub-goal. The big effort was in the delivery of an application to a large quantity of users. This meant translation into multiple languages, the "internationalization" of an application, support for multiple browsers, and support for multiple time zones. But we didn't want to overload our servers, either, so early Perl CGI applications were quickly converted to C or other languages for performance.

With applications for mobile/cloud, we desire two aspects: For mobile apps (that is, the 'UI' portion), we want something easier than "user-friendly". The operation of an app must not merely be simple, it must be obvious. For cloud apps (that is, the server portion), we want scalability. An app must not be monolithic, but assembled from collaborative components.

The objectives for systems vary from era to era. Performance was a highly measured aspect in the mainframe era, and almost ignored in the PC era.

The shift from one era to another may be difficult for practitioners. Programmers in one era may be trained to "optimize" their code for the dominant aspect. (In the mainframe era, they would optimize for performance.) A succeeding era would demand other aspects in their systems, and programmers may not be aware of the change. Thus, a highly-praised mainframe programmer with excellent skills at algorithm design, when transferred to a PC project may find that his skills are not desired or recognized. His code may receive a poor review, since the expectation for PC systems is "user friendly" and his skills from mainframe programming do not provide that aspect.

Similarly, a skilled PC programmer may have difficulties when moving to web or mobile/cloud systems. The expectations for user interface, architecture, and efficiency are quite different.

Practitioners who start with a later era (for example, the 'young turks' starting with mobile/cloud) may find it difficult to comprehend the reasoning of programmers from an earlier era. Why do mainframe programmers care about the order of mathematical operations? Why do PC programmers care so much about in-memory data structures, to the point of writing their own?

The answers are that, at the time, these were important aspects of programs. They were pounded into the programmers of earlier eras, to a degree that those programmers design their code without thinking about these optimizations.

Experienced programmers must look at the new system designs and the context of those designs. Mobile/cloud needs scalability, and therefore needs collaborative components. The monolithic designs that optimized memory usage are unsuitable to the new environment. Experienced programmers must recognize their learned biases and discard those that are not useful in the new era. (Perhaps we can consider this a problem of cache invalidation.)

Younger programmers would benefit from a deeper understanding of the earlier eras. Art students learn study the conditions (and politics) of the old masters. Architects study the buildings of the Greeks, Romans, and medieval kingdoms. Programmers familiar with the latest era, and only the latest era, will have a difficult time communicating with programmers of earlier eras.

Each era has objectives and constraints. Learn about those objectives and constraints, and you will find a deeper appreciation of programs and a greater ability to communicate with other programmers.

Sunday, April 28, 2013

C++ without source (cpp) files

A thought experiment: can we have C++ programs without source files (that it, without .cpp files)?

The typical C++ program consists of header files (.h) and source files (.cpp). The header files provide definitions for classes, and the source files provide the definition of the implementations.

Yet the C++ language allows one to define function implementation in the header files. We typically see this only for short functions. To wit:

random_file.h

class random_class
{
private:
int foo_;
public:
random_class( int foo ) : foo_(foo);
int foo( void ) { return foo_ };
}

This code defines a small class that contains a single value and has no methods. The sole member variable is initialized in the constructor.

Here's my idea: Using the concepts of functional programming (namely immutable variables that are initialized in the constructor), one can define a class as a constructor and a bunch of read-only accessors.

If we keep class size to a minimum, we can define all classes in header files. The constructors are simple, and the accessor functions simply return calculated values. There is no need for long methods.

(Yes, we could define long functions in headers, but that seems to be cheating. We allow short functions in headers and exile long functions into .cpp files.)

Such a design is, I think, possible, although perhaps impractical. It may be similar to the chemists' "perfect gas", an abstraction that is nice to conceive but unseen in the real world.

Yet a "perfect gas" of a class (perhaps a "perfect class") may be possible for some classes in a program. Those perfect classes would be small, with few member variables and only accessor functions. Its values would be immutable. The member variables may be objects of smaller classes (perhaps perfect classes) with immutable values of their own.

This may be a way to improve code quality. My experience shows that immutable objects are much easier to code, to use, and to debug. If we build simple immutable classes, then we can code them in header files and we can discard the source files.

Coding without source files -- no there is an idea for the future.

Tuesday, August 28, 2012

The deception of C++'s 'continue' and 'break'

Pick up any C++ reference book, visit any C++ web site, and you will see that the 'continue' and 'break' keywords are grouped with the loop constructs. In many ways it makes sense, since you can use these keywords with only those constructs.

But the more I think about 'continue' and 'break', the more I realize that they are not loop constructs. Yes, they are closely associated with 'while' and 'for' and 'case' statements, but they are not really loop constructs.

Instead, 'continue' and 'break' are variations on a different construct: the 'goto' keyword.

The 'continue' and 'break' statements in loops bypass blocks of code. 'continue' transfers control to the end of the loop block and allows the next iteration to continue. 'break' transfers control to the end of the loop block and forces the loop to end (allowing code after the loop to execute). These are not loop operations but 'transfer of control' operations, or 'goto' operations.

Now, modern programmers have declared that 'goto' operations are evil and must never, ever be used. Therefore, 'continue' and 'break', as 'goto' in disguise, are evil and must never, ever be used.

(The 'break' keyword can be used in 'switch/case' statements, however. In that context, a 'goto' is exactly the construct that we want.)

Back to 'continue' and 'break'.

If 'continue' and 'break' are merely cloaked forms of 'goto', then we should strive to avoid their use. We should seek out the use of 'continue' and 'break' in loops and re-factor the code to remove them.

I will be looking at code in this light, and searching for the 'continue' and 'break' keywords. When working on systems, I will make their removal one of my metrics for the improvement of the code.

Sunday, July 1, 2012

Our technology shapes our systems

In the old days, computer programs were fairly linear things. They processed data in a linear fashion, and the source code often appeared in a linear fashion.

Early business applications of computers were for accounting applications: general ledger, accounts payable, payroll, inventory... etc. These systems were often designed with a master file and one or more transaction files. The master file held information about customers, accounts, and inventory, and the transaction files held information about, well, transactions, discrete events that changed something in the master file. (For example, a bank's checking accounts would have balances in the master file, and records in the transaction file would adjust those balances. Deposits would increase a balance, checks or fees would decrease a balance.)

The files were not stored on modern devices such as USB drives or even floppy disks. In the early days, the master file was on magnetic tape, and the transactions were on punch cards.

The thing about magnetic tape is that you must run through it from beginning to end. (Much like a tour through an Ikea store.) You cannot jump around from one position to another; you must start with the first record, then process the second record, and in sequence process every record until the last.

The same holds for punch cards. Paper punch cards were placed in a hopper and read and processed one at a time.

You might wonder how you can handle processing of accounts with such restrictions in place. One pass through the master file? And only one pass through the transactions? How can we match transactions to master records if we cannot move to the proper record?

The trick was to align the input files, keeping the master file sorted and sorting the transactions before starting the update process. With a bit of thought, you can imagine a system that reads a master record and a transaction record, compares the account numbers on each (both records need a key for matching) and if they match then updates the master record and moves on to the next transaction. If they don't match then the system stores the master record (on another tape, the output tape) and runs the comparison again. The algorithm does work (although I have simplified it somewhat) and this was a common model for program design.

The rise of direct-access storage devices and complex data structures has changed programming. As processors became less expensive and more powerful, as programming languages became more expressive and allowed complex data structures such as lists and trees, as memory became available to hold complex data structures in their entirety, our model for programming became more complex. No longer were programs limited to the simple cycle of "read-master, read-transaction, compare, update, write-master, repeat".

Programming in that model (perhaps we could call it the "Transaction Pattern") was easy and low-risk because clever people figured out the algorithm and other people could copy it.

This notion of a common system model is not unique to 1960s-style programming. Microsoft Windows programs at the API level follow a specific pattern of messages sent by the Windows core "message pump". Android programs use a similar technique.

Tablet/cloud systems will probably develop one (or perhaps a handful) of common patterns, repeated (perhaps with some variations) for the majority of applications. The trick will be to identify the patterns that let us leverage the platform with minimal thought and risk. Keep your eyes open for common templates for systems. When you find one that works, when you find one that lets lots of people leverage the cleverness of a few individuals, stick with it.

I'm guessing that the system model will not be a purely linear one, as we had in the 1960s. But it may have linear aspects, with message queues serializing transactions and updates.

Monday, May 14, 2012

Just how big is too big?

I recently read (somewhere on the internet) that the optimal size of a class is 70 lines of code (LOC).

My initial thought on such a size for classes was that it was extremely small, too small to be practical. Indeed, with some languages and frameworks, it is not possible to create a class with less than 70 lines of code.

Yet after working with "Immutable Object Programming" techniques, I have come to believe that classes of size 70 LOC are possible -- and practical. A recent project saw a number of classes (not all of them, but many) on the order of 70 LOC. Some were slightly larger (perhaps 100 LOC), some a bit larger (250 LOC), and a few very large (1000 LOC). A few classes were smaller.

The idea of smaller classes is not new. Edward Yourdon, in his 1975 work "Techniques of Program Structure and Design" states that some organizations set a limit on module size to 50 LOC. At the time, object-oriented programming was unknown to the profession (although the notions of classes had been around for decades), so a module is a reasonable substitute for a class.

What I find interesting is the similarity of optimal sizes. For classes, 70 LOC. For modules, 50 LOC. I think that this may tell us something about our abilities as programmers.

I will also observe that 70 lines is about the size of three screens of text -- if we consider a "screen" to be the olde standard size of 24 lines with 80 characters. That may tell us about our abilities, too.

Fitzpatrick's Fabulous Future