Sunday, October 4, 2009

Limits to Growth

Did you know that Isaac Newton, esteemed scientist and notable man of the Church, once estimated the theoretical maximum height for trees? I didn't, until I read a recent magazine article. It claimed that he calculated the maximum height as 300 feet, using strength and weight formulas.

I have found no other reference to confirm this event, but perhaps the truth of the event is less important than the idea that one can calculate a theoretical maximum.

For trees, the calculation is straightforward. Weight is a function of volume, and can be charted as a line graph. Strength is a function of the cross-section of the tree, and can also be charted as a line graph. The two lines are not parallel, however, and the point at which they cross is the theoretical maximum. (There are a few other factors, such as the density of the wood, and they can be included in the calculation.) The intersection point is the limit, beyond which no tree can grow.

Let's move from trees to software. Are there limits to software? Can we calculate the maximum size of a program or system? Here the computations are more complex. I'm not referring to arbitrary limits such as the maximum modules a compiler can handle (although those limits seem to be relegated to our past) but to the size of a program, or of a system of programs.

It's hard to say that there are limits to the size of programs. Our industry, over the past sixty years, has seen programs and systems grow in size and complexity. In the early days, a program of a few hundred lines of code was considered large. Today we have systems with hundreds of millions of lines of code. There seems to be no upper limit.

If we cannot identify absolute limits for programs or systems, can we identify limits to programming teams? It's quite easy to see that a programming team of one person would be limited to the one of a single individual. That individual might be extremely talented and extremely hard-working, or might be an average performer. A team of programmers, in theory, can perform more work than a single programmer. Using simple logic, we could simply add programmers until we achieve the needed size.

Readers of The Mythical Man-Month by Fred Brooks will recognize the fallacy of that logic. Adding programmers to a team increases capacity, but also increases the communication load. More programmers need more coordination. Their contributions increase linearly, but coordination effort increases faster than linearly. (Metcalfe's law which indicates that communication channels increase as the square of the participants, works against you here.) You have a graph with two lines, and at some point they cross. Beyond that point, your project spends more time communication than coding, and each additional programmer costs more than they produce.

I don't have numbers. Brooks indicated that a good team size was about seven people. That's probably a shock to the managers of large, multi-million LOC projects and their teams of dozens (hundreds?) of programmers. Perhaps Brooks is wrong, and the number is higher.

The important thing is to monitor the complexity. Knowing the trend helps one plan for resources and measure efficiency. Here's my list of important factors. These are the things I would measure:

- The complexity of the data
- The complexity of the operations on the data
- The power of the programming language
- The power of the development tools (debuggers, automated tests)
- The talent of people on the team (programmers, testers, and managers)
- The communication mechanisms used by the team (e-mail, phone, video conference)
- The coordination mechanisms used by the team (meetings, code reviews, documents)
- The rate at which changes are made to the code
- The quality of the code
- The rate at which code is refactored

The last two factors are often overlooked. Changes made to the code can be of high or low quality. High-quality changes are elegant and easy to maintain. Low-quality changes get the work done, but leave the code difficult to maintain. Refactoring improves the code quality while keeping the feature set constant. Hastily-made changes often leave you in a technical hole. These two factors measure the rate at which you are climbing out of the hole. If you aren't measuring these two factors, then your team is probably digging the hole deeper.

So, as a manager, are you measuring these factors?

Or are you digging the hole deeper?


Wednesday, September 23, 2009

Why IT has difficulty with estimates

Estimating has always been a difficult task in IT. Especially for development efforts. How long will it take to write the program? How much will it cost? How many people do we need? For decades, we have struggled with estimates and project plans. Development projects run over allotted time (and over allotted budget). Why?

I observe that the problem with estimates is on the development side of IT. The other major parts of IT, support and operations, have loads that can be reliably estimated. For support, we have experience with the number of customers who call and the complexity of their issues. For operations, we have the experience of nightly jobs and the time it takes to run them. It's only on the development side, where we gather requirements, prepare designs, and do the programming that we have the problem with estimates. (I'm including testing as part of the development effort.)

The process of estimation works for repeated tasks. That is, you can form a reasonable estimate for a task that you have performed before. The more often you have performed the task, the better your estimate.

For example, most people have very good estimates for the amount of time they need for their morning commute. We know when to leave to arrive at the office on time. Every once in a while our estimate is incorrect, due to an unforeseen event such as traffic delays or water main breaks, but on average we do a pretty good job.

We're not perfect at estimates. We cannot make them out of nothing. We need some initial values, some basis for the estimate. When we are just hired and are making our first trips to the new office, we allow extra time. We leave early and probably arrive early -- or perhaps we leave at what we think is a good time and arrive late. We try different departure times and eventually find one that works for us. Once we have a repeating process, we can estimate the duration.

Hold that thought while I shift to a different topic. I'll come back to estimates, I promise.

The fundamental job of IT is to automate tasks. The task could be anything, from updating patient records to processing the day's sales transactions. It could be monitoring a set of servers and restarting jobs when necessary. It could be serving custom web pages. It is not a specific kind of task that we automate, it is the repetition of *any* task.

Once we identify a repeating task, we automate it. That's what we do. We develop programs, scripts, and sometimes even new hardware to automate well-defined, repeating tasks.

Once a task has been automated, it becomes part of the operation. As an operation task, it is run on a regular schedule with an expected duration. We can plan for the CPU load, network load, and other resources. And it is no longer part of the development task set.

The repeating tasks, the well-defined tasks, the tasks that can be predicted, move to operations. The tasks remaining for development -- the ones that need estimates -- are the ones that have are not repeating. They are new. They are not well-defined. They cover unexplored territory.

And here's where estimates come back into the discussion. Since we are constantly identifying processes that can be automated, automating them, and moving them from development to operations, the well-defined, repeatable tasks fall out of the development basket, leaving the ill-defined and non-repeating tasks. These are the tasks that cannot be estimated, since they are not well-defined and repeating.

Well, you *can* write down some numbers and call them estimates. But without experience to validate your numbers, I'm not sure how you can call them anything but guesses.


Monday, September 21, 2009

Your data are not nails

Data comes in different shapes, sizes, and with different levels of structure. The containers we select for data should respect those shapes and sizes, not force the data into a different form. But all too often, we pick one form for data and force-fit all types of data into that form. The result is data that is hard to understand, because the natural form has been replaced with the imposed form.

This post, for example, is small and has little structure (beyond paragraphs, sentences, and words). The "natural" form is the one your reading now. Forcing the text into another form, such as XML, would reduce our comprehension of the data. (Unless we converted the text back into "plain" format.)

One poor choice that I saw (and later changed) was the selection of XML for build scripts. It was a system that I inherited, one that was used by a development team to perform the compile and packaging steps for a large C++/MFC application. 

The thinking behind the choice or XML was twofold: XML allowed for some structure (it was thought there would be some) and XML was the shiny new thing. (There were some other shiny new things in the system, including Java, a web server, RMI, EJB, and reflection. It turns out that I got rid of all of the shiny things and the build system still worked.)

I can't blame the designers for succumbing to XML. Even Microsoft has gone a bit XML-happy with their configuration files for projects in Visual Studio.

It's easy to pick a single form and force all data into that form. It's also comfortable. You know that a single tool (or application) will serve your needs. But anyone who has used word processors and spreadsheets knows that the form of data lets us understand it.

Some data is structured, some is free-flowing. Some data is large, some is small. Some data consists of repeated structures, other data has multiple items with structure but each item has its own structure.

For build scripts, we found that text files were the most understandable, most flexible, and most useful form. Scripts are (typically) of moderate size. Converting the XML scripts to text saw the size of scripts shrink, from 20,000 lines to about 2200 lines. The smaller scripts were much easier to maintain, and the time for simple changes dropped from weeks to hours. (Mostly for testing. The time for script changes dropped to minutes.)

Small data sets with no to light structure fit well in text files. Possibly INI files, which have a little more structure to them.

Small to medium data sets with heavy structure fit into XML files.

Large data sets with homogeneous items fit well in relational databases.

Large data sets with heterogeneous items fit better into network databases or graph databases. (The "No SQL" movement can give you information about these databases.)

Don't think of all data as a set of nails, with your One True Format as the hammer. Use forms that make your team effective. Respect the data, and it will respect you.


Wednesday, September 16, 2009

My brain is now small

My brain is now smaller than the programming languages I use.

I started programming in 1976, on a PDP-8/e computer with timeshare BASIC. (Real BASIC, not this Visual Basic thing for wimps. BASIC has line numbers and variable names limited to a letter and an optional digit.)

Since then I have used various languages, operating systems, and environments. I've used HDOS, CP/M, and UCSD p-System on microcomputers (with BASIC, 8080 assembly language, and C), DECsystem-10s (with TOPS-10 and FORTRAN and Pascal), MS-DOS (with C and C++), and Windows (with C++, Java, Perl, and a few other languages).

In each case, I have struggled with the language (and run-time, and environment). The implementation has been the limiting factor. My efforts have been (mostly) beating the language/environment/implementation into submission to get the job done.

That has now changed.

I've been using the Ruby language for a few small projects. Not Ruby on Rails, which is the web framework that uses Ruby as the underlying language, but the Ruby language itself. It is a simple scripting language, like Perl or Python. It has a clean syntax and object-oriented concepts are baked in, not stuck on like in Perl. But that's not the most important point.

Ruby lets me get work done.

The language and run-time library are simple, elegant, and most importantly, capable. It lets me do work and stays out of my way. This is a pleasant change.

And somewhat frightening.

With Ruby, the limiting factor is not the language. The limiting factor is my programming skills. And while my programming skills are good, they are the result of working with stunted languages for the past thirty-odd years. The really neat stuff, the higher-order programming concepts, I have not learned, because the languages that I used could not support them.

Now I can.

I'm frightened, and excited.


Monday, September 14, 2009

Destination Moon

I recently watched the classic movie "Destination Moon". In this movie, the government convinces a set of corporations to design, engineer, and build a rocket that can fly to the moon. It's an interesting movie, albeit a period piece with its male-dominated cast, suits and ties, and special effects. It is technically accurate (for 1950) and tells an enchanting tale.

What is most interesting (and possibly scary) is the approach to project management. The design team, with experience in airplanes and some with stratospheric rockets, builds a single moon rocket in a single project. That's a pretty big leap.

The actual moon missions, run by NASA in the 1960s, took smaller steps. NASA ran three different programs (Mercury, Gemini, and Apollo) each with specific goals. Each program consisted of a number of missions, and each mission had a lot of work, including tests.

Yet for the movies, the better tale is of some bright, industrial engineers who build a rocket and fly it to the moon. The movie heightens the sense of suspense by avoiding the tests. Its much more exciting to launch an *untested* rocket and fly it to the moon!

For big projects, NASA has the better approach: small steps, test as you go, and learn as you go. That goes for software projects too.

Anyone involved in software development for any reasonable length of time has been involved in (or near to) a "big-bang" project. These projects are large, complex, and accompanied by enthusiasm and aggressive schedules. And often, they are run like the project in "Destination Moon": build the system all at once. The result is usually disappointing, since large complex systems are, well, large and complex. And we cannot think of everything. Something (often many things) are left out, and the designed system is slow, incomplete, and sometimes incorrect.

Agile enthusiasts call this the "Big Design Up Front" method, and consider it evil. They prefer to build small portions, test the portions, and learn from them. They would be comfortable on the NASA project: fly the Mercury missions first and learn about orbiting the earth and operating outside the ship, then build the Gemini missions and learn about docking, and finally build the Apollo missions and go to the moon. On each mission, the folks at NASA learned more about the problem and found solutions to the problems. The Apollo missions had problems, but with the knowledge from earlier missions, NASA was able to succeed.

The "Destination Moon" style of project management is nice for movies, but for real-life projects, NASA has the better approach.


Friday, September 11, 2009

Opening the gates for Windows

The SystemMax PC from TigerDirect arrived today. This is a 2.4 GHz dual core, 2 GB RAM, 160 GB DASD unit. I ordered it for Windows 7, and set the PC up this afternoon.

Installing Windows 7RC was fairly easy. Microsoft has made the install just as good as the installs for most Linux distros. The PC started, booted off the DVD, started the install program, detected the video card and internet interface, asked a few questions, and ran for about ten minutes. The process did require two restarts. When it was finished, I had Windows 7 running.

I downloaded and installed the Microsoft Word Viewer and Microsoft Excel Viewer. I also downloaded and installed Visual C# Express and Web Developer Express. They included SQL Server Express and the Silverlight runtime libraries.

This is a big event in my shop. I banished Windows back in 2005 and used nothing but Linux since then. Now Windows is back on my network.

Why allow Windows? To get experience with Microsoft SQL Server. Lots of job posts list it as a required skill. I will do what it takes to get a good job.


Monday, September 7, 2009

Application Whitelist Blues

The latest security/compliance tool for desktop PCs is the application whitelist. This is an administration utility that monitors every application that runs (or attempts to run) and allows only those applications that are on the approved list.

Companies like Bit9 sell application whitelist tools. They have slick advertising and web pages. Here's a quote from Bit9's web page: "Imagine issuing a computer to a new employee and getting it back two years later - with nothing else installed on it but the applications and patches you had centrally rolled out." Appealling, no?

Whitelists will certainly prevent the use of unapproved programs. CIOs will have no worries about their team using pirated software. CIOs will know that their team is not using software that has been downloaded from the internet or smuggled in from home. They will have no worries about their workers bringing in virus programs on USB drives.

Of course, application whitelist techniques are incomplete. They claim to govern the use of programs. But what is a program?

Certainly .EXE files are programs. And it's easy to convince people that .DLL  files are executables. (Microsoft considers them as such.)

Java .class files are programs. They must be "run" by the JVM and not directly by the processor, but they are programs. So are .NET executables (which usually sport the extension .EXE); they are "executed" by Microsoft's CLR, the counterpart to the Java JVM. Application whitelists will probably govern individual .NET programs (since they are launched with the .EXE extension) but not individual Java applications. They will probably stop at the JVM.

Perl, Python, and Ruby programs are text and run by their interpreters. They are compiled into bytecode like Java and .NET programs, but as they are read, not in advance. They can be considered programs. I'm fairly sure that application whitelists won't govern individual Perl scripts; they will allow or disallow the Perl interpreter. If you can run Perl, then you can run any Perl script.

And what about spreadsheets? Spreadsheets are not too far from Perl scripts, in terms of their program-ness. Spreadsheets are loaded and "executed" by an "interpreter" just like a Perl script is loaded and executed. I doubt that application whitelists will govern individual spreadsheets.

From what I can see, programs like Bit9's "Parity" application whitelist monitor .EXE files and nothing more.

OK, enough with the philosophical discussion of applications.

Here's the real danger of application whitelists: If they govern the use of programs (whatever we agree as programs) and the lists are created by governance committees (or software standards committees, or what have you) then they limit the creativity of the entire organization. The entire organization will be constrained to the approved list of software. No one in the organization will be able to think outside the approved-software box.

The entire organization will be as creative as the governance committee lets them be.

And not one bit more.