Fitzpatrick's Fabulous Future: best practices

Showing posts with label best practices. Show all posts

Thursday, September 5, 2013

Measure code complexity

We measure many things on development projects, from the cost to the time to user satisfaction. Yet we do not measure the complexity of our code.

One might find this surprising. After all, complexity of code is closely tied to quality (or so I like to believe) and also an indication of future effort (simple code is easier to change than complicated code).

The problem is not in the measurement of complexity. We have numerous techniques and tools, spanning the range from "lines of code" to function points. There are commercial tools and open source tools that measure complexity.

No, the problem is not in techniques or tools.

It is a matter of will. We don't measure complexity because, in short, we don't want to.

I can think of a few reasons that discourage the measurement of source code complexity.

- The measurement of complexity is a negative one. That is, more complexity is worse. A result of 170 is better than a result of 270, and this inverted scale is awkward. We are trained to like positive measurements, like baseball scores. (Perhaps the golf enthusiasts would see more interest if they changed their scoring system.)

- There is no direct way to connect complexity to cost. While we understand that a complicated code base is harder to maintain that a simple one, we have no way of converting that extra complexity into dollars. If we reduce our complexity from 270 to 170 (or 37 percent), do we reduce the cost of development by the same percentage? Why or why not? (I suspect that there is a lot to be learned in this area. Perhaps several Masters theses can be derived from it.)

- Not knowing the complexity shifts risk from managers to developers. In organizations with antagonistic relations between managers and developers, a willful ignorance of code complexity pushes risk onto developers. Estimates, if made by managers, will ignore complexity. Estimates made by developers may be optimistic (or pessimistic) but may be adjusted by managers. In either case, schedule delays will be the fault of the developer, not the manager.

- Developers (in shops with poor management relations) may avoid the use of any metrics, fearing that they will be used for performance evaluations.

Looking forward, I can see a time when we do measure code complexity.

- A company considering the acquisition of software (including the source code), may want an unbiased opinion of the code. They may not completely trust the seller (who is biased towards the sale) and they may not trust their own people (who may be biased against 'outside' software).

- A project team may want to identify complex areas of their code, to identify high-risk areas.

- A development team may wish to estimate the effort for maintaining code, and may include the complexity as a factor in that effort.

The tools are available.

I believe that we will, eventually, consider complexity analysis a regular part of software development. Perhaps it will start small, like the adoption of version control and automated testing. Both of those techniques were at one time considered new and unproven. Today, they are considered 'best practices'.

Sunday, July 28, 2013

The style curve

At a recent conference, a fellow attendee asked about best practices for the live tiles on Windows 8.

Live tiles are different from the standard icons in that they can show information and change over time. Windows 8 comes with a number of live tiles: the clock, news, Bing search, and entertainment are a few.

For the question of best practices, my take is that we're too early in what I call the "style curve" of Windows live tiles. The style curve is similar to the "hype curve", in which new technologies are born, receive some hype, then become disparaged as they fail to cure all of our ills, and finally become accepted as useful. See more about the hype curve on wikipedia.

The style curve applies to new technologies, and is similar to the hype curve in that a technology is created, given lots of attention, disparaged, and then accepted. Here are the phases:

Creation The technology is created and made available.

Experimentation People test out the new technology and test its limits.

Overuse People adopt the new technology and use it, but with poor judgement. They use it for too many things, or too many situations, or with too many combinations.

Avoidance People dislike the overuse (or the poor taste) and complain. Some actively avoid the new technology.

Best practices A few folks use the technology with good taste. They demonstrate that the technology can be used without offending people's sensibilities. The techniques they use are dubbed "best practices".

Acceptance The techniques of restrained use (the best practices) are adopted by most folks.

Previous technologies have followed this curve. Examples include typefaces and fonts in desktop publishing (and later word processing) and animated images in web pages.

Some readers will remember the early days of the web and some of the garish designs that were used. The memories of spinning icons and blinking text may still be painful. This was the "overuse" phase of the style cycle for web pages. Several shops banned outright the use of the blink tag -- the "avoidance" phase. Now people understand good design principles for web pages. (Which do not include the blink tag, thankfully.)

Desktop publishing, powered by Windows and laser printers, allowed people to use a multitude of typefaces and fonts in their documents. And use them they did. Today we have use a limited set of typefaces and fonts in any one document, and shops have style guides.

Coming back to live tiles, I think we are at the "experimentation" phase of the style cycle. We don't know the limits on live tiles and we don't know the best practices. We have to go through the "overuse" and "avoidance" phases before we can get to "best practices". In other words, the best practices are a matter of knowing what not to do. But we have to try everything to see what works and what doesn't work.

Be prepared for some ugly, garish, and annoying live tiles. But know that style will arrive in the future.

Saturday, May 25, 2013

Best practices are not best forever

Technology changes quickly. And with changes in technology, our views of technology change, and these views affect our decisions on system design. Best practices in one decade may be inefficient in another.

A recent trip to the local car dealer made this apparent. I had brought my car in for routine service, and the mechanic and I reviewed the car's maintenance history. The dealer has a nice, automated system to record all maintenance on vehicles. It has an on-line display and prints nicely-formatted maintenance summaries. A "modern" computer system, probably designed in the 1980s and updated over the years. (I put the word "modern" in quotes because it clearly runs on a networked PC with a back end database, but it does not have tablet or phone apps.)

One aspect of this system is the management of data. After some amount of time (it looks like a few years), maintenance records are removed from the system.

Proper system design once included the task of storage management. A "properly" designed system (one that followed "best practices") would manage data for the users. Data would be retained for a period of time but not forever. One had to erase information, because the total available space was fixed (or additional space was prohibitively expensive) and programming the system to manage space was more effective that asking people to erase the right data at the right time. (People tend to wait until all free storage is used and then binge-erase more data than necessary.)

That was the best practice -- at the time.

Over time, the cost of storage dropped. And over time, our perception of the cost of storage dropped.

Google has a big role in our new perception. With the introduction of GMail, Google gave each account holder a full gigabyte of storage. A full gigabyte! The announcement shocked the industry. Today, it is a poor e-mail service that cannot promise a gigabyte of storage.

Now, Flickr is giving each account holder a full terabyte of storage. A full terabyte! Even I am surprised at the decision. (I also think that it is a good marketing move.)

Let's return to the maintenance tracking system used by the car dealer.

Such quantities of storage vastly surpass the meager storage used by a few maintenance records. Maintenance records each take a few kilobytes of data (it's all text, and only a few pages). A full megabyte of data would hold all maintenance records for several hundred repairs and check-ups. If the auto dealer assigned a full gigabyte to each customer, they could easily hold all maintenance records for the customer, even if the customer brought the car for repairs every month for an extended car-life of twenty years!

Technology has changed. Storage has become inexpensive. Today, it would be a poor practice to design a system that auto-purges records. You spend more on the code and the tests than you save on the reduction in storage costs. You lose older customer data, preventing you from analyzing trends over time.

The new best practices of big data, data science, and analytics, require data. Old data has value, and the value is more than the cost of storage.

Best practices change over time. Be prepared for changes.

Friday, August 24, 2012

How I fix old code

Over the years (and multiple projects) I have developed techniques for improving object-oriented code. My techniques work for me (and the code that has been presented to me). here is what I do:

Start at the bottom Not the base classes, but the bottom-most classes. The classes that are used by other parts of the code, and have no dependencies. These classes can stand alone.

Work your way up After fixing the bottom classes, move up one level. Fix those classes. Repeat. Working up from the bottom is the only way I have found to be effective. One can have an idea of the final result, a vision of the finished product, but only by fixing the problems at the bottom can one achieve any meaningful results.

Identify class dependencies To start at the bottom, one must know the class dependencies. Not the class hierarchy, but the dependencies between classes. (Which classes use which other classes at run-time.) I use some custom Perl scripts to parse code and create a list of dependencies. The scripts are not perfect but they give me a good-enough picture. The classes with no dependencies are the bottom classes. Often they are utility classes that perform low-level operations. They are the place to start.

Create unit tests Tests are your friends! Unit tests for the bottom (stand-alone) classes are generally easy to create and maintain. Tests for higher-level classes are a little trickier, but possible with immutable lower-level classes.

Make objects immutable The Java String class (and the C# String class) showed us a new way of programming. I ignored it for a long time (too long, in my opinion). Immutable objects are unchangeable, and do not have the "classic" object-oriented functions for setting properties. Instead, they are fixed to their original value. When you want to change a property, the immutable object techniques dictate that instead of modifying an object you create a new object.

I start by making the lowest-level classes immutable, and then working my way up the "chain" of class dependencies.

Make member variables private Create accessor functions when necessary. I prefer to create "get" accessors only, but sometime it is necessary to create "set" accessors. I find that it easier to track and identify access with functions than with member variables, but that may be an effect of Visual Studio. Once the accessors are in place, I forget about the "get" accessors and look to remove the "set" accessors"

Create new constructors Constructors are your friends. They take a set of data and build an object. Create the ones that make sense for your application.

Fix existing constructors to be complete Sometimes people use constructors to partially construct objects, relying on the code to call "set" accessors later. Immutable object programming has none of that nonsense: when you construct an object you must provide everything. If you cannot provide everything, then you are not allowed to construct the object! No soup (or object) for you!

When possible, make member functions static Static functions have no access to member variables, so one must pass in all "ingredient" variables. This makes it clear which variables must be defined to call the function. Not all member functions can be static; make the functions called by constructors static when possible. (Really, put the effort into this task.) Calls to static functions can be re-sequenced at will, since they cannot have side effects on the object.

Static functions can also be moved from one class to another, at will. Or at least easier than member functions. It's a good attribute when re-arranging code.

Reduce class size Someone (I don't remember where) claimed that the optimum class size was 70 lines of code. I tend to agree with this size. Bottom classes can easily be expressed in 70 lines. (if not, they are probably composites of multiple elementary classes.) Higher-level classes can often be represented in 70 lines or less, sometimes more. (But never more than 150 lines.)

Reducing class size usually means increasing the number of classes. You code size may shrink somewhat (my experience shows a reduction of 40 to 60 percent) but it does not reduce to zero. Smaller classes often means more classes. I find that a system with more, smaller classes is easier to understand than one with fewer, large classes.

Name your classes well Naming is one of the great challenges of programming. Pick names carefully, and change names when it makes sense. (If your version control system resists changes to class names, get a new version control system. It is the servant, not you!)

Talk with other developers Discuss changes with other developers. Good developers can provide useful feedback and ideas. (Poor developers will waste your time, though.)

Discuss code with non-developers Our goal is to create code that can be read by non-developers who are experts in the subject matter. We want them to read our code, absorb it, and provide feedback. We want them to say "yes, that seems right" (or even better, "oh, there is a problem here with this calculation"). To achieve that level of understanding, we need to strip away all of the programming overhead: temporary variables, memory allocation, and sequence/iteration gunk. With immutable object programming, meaningful names, and modern constructs (in C++, that means BOOST) we can create high-level routines that are readable by non-programmers.

(Note that we are not asking the non-programmers to write code, merely to read it. That is enough.)

These techniques work for me (and the folks on my projects). Your mileage may vary.

Monday, February 27, 2012

Uglies

Programming has seen a number of ugly things, and we programmers (and more specifically, language designers) have improved them.

The GOTO statement

The most famous ugly thing in programming is probably the GOTO statement. First called out by Edsger Dijkstra in the Communications of the ACM, (description here on wikipedia) it is the poster child of poor programming practices. The GOTO statement was a direct analog of the assembly language "jump" instruction (often assigned the code 'JMP', but it varied from processor to processor), and it allowed for difficult-to-read programs. We improved programming languages with structured programming, 'if/then/else' statements, 'while' loops, and iterations over collections. (The 'goto' statement was omitted from Java, but remains in C, C++, and C#.)

Global variables

Global variables were another form of ugliness, allowing any part of a program to read (or more excitedly, modify) a variable. One never could tell what value a global variable would contain. They were mandatory in COBOL; present in FORTRAN, C, and C++; removed in Java and C#.

Arithmetic IF

FORTRAN has the honor of originating the 'arithmetic IF', a three-destination comparison of a value. (It was not limited to FORTRAN, the thing would show up later in FOCAL.) One of three GOTO statements would be executed, based on the sign of an expression. (The sign could be positive, negative, or zero.) This nasty beast was the result of the IBM 704 instruction set, which allowed such a construct in one instruction. Efficient for the processor, but not so much for the programmers.

Pointers

Pointers were available in Pascal and the life-blood of C. In Pascal (and in C) they allowed the construction of numerous data structures, many of which were impossible in earlier languages. Yet pointers were also a form of "GOTO in data" and lead to lots of headaches. They were eventually replaced by references, which were pointers bound to a known valid entity.

Memory management

The pointers in C (and Pascal, somewhat) demanded memory management. One could allocate memory for anything, but one also had to track that memory and release it when one was finished with it. The later languages of Visual Basic, Perl, Java, C#, Python, and Ruby all replaced manual memory management with garbage collection.

Early garbage collection algorithms were unpredictable and often caused performance problems. Later algorithms (and faster processors) made garbage collection practical.

Column-dependent coding

FORTRAN (and to some extent COBOL) used indentation as a significant indicator to the compiler. FORTRAN was locked into a restrictive format that specified columns for line numbers and statements (and statement continuation on a successive 'source card'). COBOL's view of indentation was more advanced. While optional, it was a popular convention and saw life again in Python's indentation for block definition.

Short variable names

BASIC was initially limited to a single letter and an optional digit. (No 'R2D2' for you!) Original FORTRAN saw variable names limited to six characters. Early PC compilers for BASIC, Pascal, and C had similar restrictions. Modern compilers and interpreters allow for variable names longer than I can care to type (and I type some long names!).

The overall trend

Looking back, we can see that lots of ugly programming constructs were made for efficiency (arithmetic IF) or due to limitations of memory (short variable names) or processing power (GOTO). Advances in hardware allowed for work to be shifted from programmers to compilers, interpreters, and run-time systems. But here's the thing: advances in programming languages and techniques are much slower than advances in hardware.

Today's computers are more powerful than those of the 1960s by several orders of magnitude. While replacing GOTO with structure programming and replacing direct memory management with garbage collection, the change in software is much smaller than that of the change for hardware.

I'm tending to think that this effect is caused by the locality of software. We programmers are close to our programs; the hardware is remote, sitting on the far side of the compiler. We can, should we choose, replace the hardware with faster (compatible) equipment or switch to a new target processor by changing the back end of the compiler. In contrast, the programming language constructs are close to us, living inside our heads. We think in our programming languages and are loathe to give them up.

Moreover, we programmers often learn to overcome the ugly aspects of programming languages and sometimes develop techniques to leverage them. We become attached to these tricks, and we are quite reluctant to let them go.

If we want to advance the art, we will have to give up the old (ugly) constructs and adopt the new techniques. It is not easy; I myself have had to give up BASIC for C, C for C++, and C++ for Java and C#, and C# for Ruby. I have given up unstructured programming for structured programming, structured (procedural) programming for object-oriented programming, and object-oriented programming for immutable-object-programming. Each transition has been difficult, in which I had to un-learn the old ways. Yet I find that the new languages and techniques are better and allow me to be more effective.

Sunday, December 11, 2011

Tradeoffs

It used to be that we had to write small, fast programs. Processors were slow, storage media (punch cards, tape drives, disc drives) were even slower, and memory was limited. In such a world, programmers were rewarded for tight code, and DP managers were rewarded for maintaining systems at utilization rates of ninety to ninety-five percent of machine capacity. The reason was that a higher rate meant that you needed more equipment, and a lower rate meant that you had purchased (or more likely, leased) too much equipment.

In that world, programmers had to make tradeoffs when creating systems. Readable code might not be fast, and fast code might not be readable (and often the two were true). Fast code won out over readable (slower) code. Small code that squeezed the most out of the hardware won out over readable (less efficient) code. The tradeoffs were reasonable.

The world has changed. Computers have become more powerful. Networks are faster and more reliable. Databases are faster, and we have multiple choices of database designs -- not everything is a flat file or a set of related tables. Equipment is cheap, almost commodities.

This change means that the focus of costs now shifts. Equipment is not the big cost item. CPU time is not the big cost item. Telecommunications is not the big cost item.

The big problem of application development, the big expense that concerns managers, the thing that will get attention, will be maintenance: the time and cost to modify or enhance an existing system.

The biggest factor in maintenance costs, in my mind, is the readability of the code. Readable code is easy to change (possibly). Opaque code is impossible to change (certainly).

Some folks look to documentation, such as design or architecture documents. I put little value in documentation; I have always found the code to be the final and most accurate description of the system. Documents suffer from aging: they were correct some but the system has been modified. Documents suffer from imprecision: they specify some but not all of the details. Documents suffer from inaccuracy: they specify what the author thought the system was doing, not what the system actually does.

Sometimes documentation can be useful. The business requirements of a system can be useful. But I find "System architecture" and "Design overview" documents useless.

If the code is to be the documentation for itself, then it must be readable.

Readability is a slippery concept. Different programmers have different ideas about "readability". What is readable to me may not be readable to you. Over my career, my ideas of readability have changed, as I learned new programming techniques (structured programming, object-oriented programming, functional programming), and even as I learned more about a language (my current ideas of "readable" C++ code are very different from my early ideas of "readable" C++ code).

I won't define readability. I will let each project decide on a meaningful definition of readability. I will list a few ideas that will let teams improve the readability of their code (however they define it).

Version control for source code A shop that is not using version control is not serious about software development. There are several reliable, well-documented and well supported, popular systems for version control. Version control lets multiple team members work together and coordinate their changes.

Automated builds An automated build lets you build the system reliably, consistently, and at low effort. You want the product for the customer to be built with a reliable and consistent method.

Any developer can build the system Developers need to build the system to run their tests. They need a reliable, consistent, low-effort, method to do that. And it has to work with their development environment, allowing them to change code and debug the system.

Automated testing Like version control, automated testing is necessary for a modern shop. You want to test the product before you send it to your customers, and you want the testing to be consistent and reliable. (You also want it easy to run.)

Any developer can test the system Developers need to know that their changes affect only the behaviors that they intend, and no other parts of the system. They need to use the tests to ensure that their changes have no unintended side-effects. Low-effort automated tests let them run the tests often.

Acceptance of refactoring To improve code, complicated classes and modules must be changed into sets of smaller, simpler classes and modules. Refactoring changes the code without changing the external behavior of the code. If I start with a system that passes its tests (automated tests, right?) and I refactor it, it should pass the same tests. When I can rearrange code, without changing the behavior, I can make the code more readable.

Incentives for developers to use all of the above Any project that discourages developers from using automated builds or automated tests, either explicitly or implicitly, will see little or no improvements in readability.

But the biggest technique for readable code is that the organization -- its developers and managers -- must want readable code. If the organization is more concerned with "delivering a quality product" or "meeting the quarterly numbers", then they will trade off readability for those goals.

Fitzpatrick's Fabulous Future