Wednesday, September 23, 2009

Why IT has difficulty with estimates

Estimating has always been a difficult task in IT. Especially for development efforts. How long will it take to write the program? How much will it cost? How many people do we need? For decades, we have struggled with estimates and project plans. Development projects run over allotted time (and over allotted budget). Why?

I observe that the problem with estimates is on the development side of IT. The other major parts of IT, support and operations, have loads that can be reliably estimated. For support, we have experience with the number of customers who call and the complexity of their issues. For operations, we have the experience of nightly jobs and the time it takes to run them. It's only on the development side, where we gather requirements, prepare designs, and do the programming that we have the problem with estimates. (I'm including testing as part of the development effort.)

The process of estimation works for repeated tasks. That is, you can form a reasonable estimate for a task that you have performed before. The more often you have performed the task, the better your estimate.

For example, most people have very good estimates for the amount of time they need for their morning commute. We know when to leave to arrive at the office on time. Every once in a while our estimate is incorrect, due to an unforeseen event such as traffic delays or water main breaks, but on average we do a pretty good job.

We're not perfect at estimates. We cannot make them out of nothing. We need some initial values, some basis for the estimate. When we are just hired and are making our first trips to the new office, we allow extra time. We leave early and probably arrive early -- or perhaps we leave at what we think is a good time and arrive late. We try different departure times and eventually find one that works for us. Once we have a repeating process, we can estimate the duration.

Hold that thought while I shift to a different topic. I'll come back to estimates, I promise.

The fundamental job of IT is to automate tasks. The task could be anything, from updating patient records to processing the day's sales transactions. It could be monitoring a set of servers and restarting jobs when necessary. It could be serving custom web pages. It is not a specific kind of task that we automate, it is the repetition of *any* task.

Once we identify a repeating task, we automate it. That's what we do. We develop programs, scripts, and sometimes even new hardware to automate well-defined, repeating tasks.

Once a task has been automated, it becomes part of the operation. As an operation task, it is run on a regular schedule with an expected duration. We can plan for the CPU load, network load, and other resources. And it is no longer part of the development task set.

The repeating tasks, the well-defined tasks, the tasks that can be predicted, move to operations. The tasks remaining for development -- the ones that need estimates -- are the ones that have are not repeating. They are new. They are not well-defined. They cover unexplored territory.

And here's where estimates come back into the discussion. Since we are constantly identifying processes that can be automated, automating them, and moving them from development to operations, the well-defined, repeatable tasks fall out of the development basket, leaving the ill-defined and non-repeating tasks. These are the tasks that cannot be estimated, since they are not well-defined and repeating.

Well, you *can* write down some numbers and call them estimates. But without experience to validate your numbers, I'm not sure how you can call them anything but guesses.


Monday, September 21, 2009

Your data are not nails

Data comes in different shapes, sizes, and with different levels of structure. The containers we select for data should respect those shapes and sizes, not force the data into a different form. But all too often, we pick one form for data and force-fit all types of data into that form. The result is data that is hard to understand, because the natural form has been replaced with the imposed form.

This post, for example, is small and has little structure (beyond paragraphs, sentences, and words). The "natural" form is the one your reading now. Forcing the text into another form, such as XML, would reduce our comprehension of the data. (Unless we converted the text back into "plain" format.)

One poor choice that I saw (and later changed) was the selection of XML for build scripts. It was a system that I inherited, one that was used by a development team to perform the compile and packaging steps for a large C++/MFC application. 

The thinking behind the choice or XML was twofold: XML allowed for some structure (it was thought there would be some) and XML was the shiny new thing. (There were some other shiny new things in the system, including Java, a web server, RMI, EJB, and reflection. It turns out that I got rid of all of the shiny things and the build system still worked.)

I can't blame the designers for succumbing to XML. Even Microsoft has gone a bit XML-happy with their configuration files for projects in Visual Studio.

It's easy to pick a single form and force all data into that form. It's also comfortable. You know that a single tool (or application) will serve your needs. But anyone who has used word processors and spreadsheets knows that the form of data lets us understand it.

Some data is structured, some is free-flowing. Some data is large, some is small. Some data consists of repeated structures, other data has multiple items with structure but each item has its own structure.

For build scripts, we found that text files were the most understandable, most flexible, and most useful form. Scripts are (typically) of moderate size. Converting the XML scripts to text saw the size of scripts shrink, from 20,000 lines to about 2200 lines. The smaller scripts were much easier to maintain, and the time for simple changes dropped from weeks to hours. (Mostly for testing. The time for script changes dropped to minutes.)

Small data sets with no to light structure fit well in text files. Possibly INI files, which have a little more structure to them.

Small to medium data sets with heavy structure fit into XML files.

Large data sets with homogeneous items fit well in relational databases.

Large data sets with heterogeneous items fit better into network databases or graph databases. (The "No SQL" movement can give you information about these databases.)

Don't think of all data as a set of nails, with your One True Format as the hammer. Use forms that make your team effective. Respect the data, and it will respect you.


Wednesday, September 16, 2009

My brain is now small

My brain is now smaller than the programming languages I use.

I started programming in 1976, on a PDP-8/e computer with timeshare BASIC. (Real BASIC, not this Visual Basic thing for wimps. BASIC has line numbers and variable names limited to a letter and an optional digit.)

Since then I have used various languages, operating systems, and environments. I've used HDOS, CP/M, and UCSD p-System on microcomputers (with BASIC, 8080 assembly language, and C), DECsystem-10s (with TOPS-10 and FORTRAN and Pascal), MS-DOS (with C and C++), and Windows (with C++, Java, Perl, and a few other languages).

In each case, I have struggled with the language (and run-time, and environment). The implementation has been the limiting factor. My efforts have been (mostly) beating the language/environment/implementation into submission to get the job done.

That has now changed.

I've been using the Ruby language for a few small projects. Not Ruby on Rails, which is the web framework that uses Ruby as the underlying language, but the Ruby language itself. It is a simple scripting language, like Perl or Python. It has a clean syntax and object-oriented concepts are baked in, not stuck on like in Perl. But that's not the most important point.

Ruby lets me get work done.

The language and run-time library are simple, elegant, and most importantly, capable. It lets me do work and stays out of my way. This is a pleasant change.

And somewhat frightening.

With Ruby, the limiting factor is not the language. The limiting factor is my programming skills. And while my programming skills are good, they are the result of working with stunted languages for the past thirty-odd years. The really neat stuff, the higher-order programming concepts, I have not learned, because the languages that I used could not support them.

Now I can.

I'm frightened, and excited.


Monday, September 14, 2009

Destination Moon

I recently watched the classic movie "Destination Moon". In this movie, the government convinces a set of corporations to design, engineer, and build a rocket that can fly to the moon. It's an interesting movie, albeit a period piece with its male-dominated cast, suits and ties, and special effects. It is technically accurate (for 1950) and tells an enchanting tale.

What is most interesting (and possibly scary) is the approach to project management. The design team, with experience in airplanes and some with stratospheric rockets, builds a single moon rocket in a single project. That's a pretty big leap.

The actual moon missions, run by NASA in the 1960s, took smaller steps. NASA ran three different programs (Mercury, Gemini, and Apollo) each with specific goals. Each program consisted of a number of missions, and each mission had a lot of work, including tests.

Yet for the movies, the better tale is of some bright, industrial engineers who build a rocket and fly it to the moon. The movie heightens the sense of suspense by avoiding the tests. Its much more exciting to launch an *untested* rocket and fly it to the moon!

For big projects, NASA has the better approach: small steps, test as you go, and learn as you go. That goes for software projects too.

Anyone involved in software development for any reasonable length of time has been involved in (or near to) a "big-bang" project. These projects are large, complex, and accompanied by enthusiasm and aggressive schedules. And often, they are run like the project in "Destination Moon": build the system all at once. The result is usually disappointing, since large complex systems are, well, large and complex. And we cannot think of everything. Something (often many things) are left out, and the designed system is slow, incomplete, and sometimes incorrect.

Agile enthusiasts call this the "Big Design Up Front" method, and consider it evil. They prefer to build small portions, test the portions, and learn from them. They would be comfortable on the NASA project: fly the Mercury missions first and learn about orbiting the earth and operating outside the ship, then build the Gemini missions and learn about docking, and finally build the Apollo missions and go to the moon. On each mission, the folks at NASA learned more about the problem and found solutions to the problems. The Apollo missions had problems, but with the knowledge from earlier missions, NASA was able to succeed.

The "Destination Moon" style of project management is nice for movies, but for real-life projects, NASA has the better approach.


Friday, September 11, 2009

Opening the gates for Windows

The SystemMax PC from TigerDirect arrived today. This is a 2.4 GHz dual core, 2 GB RAM, 160 GB DASD unit. I ordered it for Windows 7, and set the PC up this afternoon.

Installing Windows 7RC was fairly easy. Microsoft has made the install just as good as the installs for most Linux distros. The PC started, booted off the DVD, started the install program, detected the video card and internet interface, asked a few questions, and ran for about ten minutes. The process did require two restarts. When it was finished, I had Windows 7 running.

I downloaded and installed the Microsoft Word Viewer and Microsoft Excel Viewer. I also downloaded and installed Visual C# Express and Web Developer Express. They included SQL Server Express and the Silverlight runtime libraries.

This is a big event in my shop. I banished Windows back in 2005 and used nothing but Linux since then. Now Windows is back on my network.

Why allow Windows? To get experience with Microsoft SQL Server. Lots of job posts list it as a required skill. I will do what it takes to get a good job.


Monday, September 7, 2009

Application Whitelist Blues

The latest security/compliance tool for desktop PCs is the application whitelist. This is an administration utility that monitors every application that runs (or attempts to run) and allows only those applications that are on the approved list.

Companies like Bit9 sell application whitelist tools. They have slick advertising and web pages. Here's a quote from Bit9's web page: "Imagine issuing a computer to a new employee and getting it back two years later - with nothing else installed on it but the applications and patches you had centrally rolled out." Appealling, no?

Whitelists will certainly prevent the use of unapproved programs. CIOs will have no worries about their team using pirated software. CIOs will know that their team is not using software that has been downloaded from the internet or smuggled in from home. They will have no worries about their workers bringing in virus programs on USB drives.

Of course, application whitelist techniques are incomplete. They claim to govern the use of programs. But what is a program?

Certainly .EXE files are programs. And it's easy to convince people that .DLL  files are executables. (Microsoft considers them as such.)

Java .class files are programs. They must be "run" by the JVM and not directly by the processor, but they are programs. So are .NET executables (which usually sport the extension .EXE); they are "executed" by Microsoft's CLR, the counterpart to the Java JVM. Application whitelists will probably govern individual .NET programs (since they are launched with the .EXE extension) but not individual Java applications. They will probably stop at the JVM.

Perl, Python, and Ruby programs are text and run by their interpreters. They are compiled into bytecode like Java and .NET programs, but as they are read, not in advance. They can be considered programs. I'm fairly sure that application whitelists won't govern individual Perl scripts; they will allow or disallow the Perl interpreter. If you can run Perl, then you can run any Perl script.

And what about spreadsheets? Spreadsheets are not too far from Perl scripts, in terms of their program-ness. Spreadsheets are loaded and "executed" by an "interpreter" just like a Perl script is loaded and executed. I doubt that application whitelists will govern individual spreadsheets.

From what I can see, programs like Bit9's "Parity" application whitelist monitor .EXE files and nothing more.

OK, enough with the philosophical discussion of applications.

Here's the real danger of application whitelists: If they govern the use of programs (whatever we agree as programs) and the lists are created by governance committees (or software standards committees, or what have you) then they limit the creativity of the entire organization. The entire organization will be constrained to the approved list of software. No one in the organization will be able to think outside the approved-software box.

The entire organization will be as creative as the governance committee lets them be.

And not one bit more.


Wednesday, September 2, 2009

Microsoft thinks inside the document

The Microsoft world revolves around the word processor. It's collective mindset is one of individuals working on documents with little communication between people. And why not? Microsoft Word (and Excel, and Powerpoint) made Microsoft successful.

The earliest text processors were written in the 1950s, but word processors as we know them today were created in the late 1970s. Products such as Electric Pencil and Wordstar defined the notion of interactive word processing: a person typing on a keyboard and looking at a document on the screen. The term "WYSIWYG" came into usage, as batch text processors grew into interactive word processors.

The mental model is one of the struggling author, or the hard-bitten newspaper reporter, slogging away on a manual typewriter. It is the individual preparing a Great Work.

And this is the model that Microsoft follows. Most of its products follow the idea of an individual working on a document. (Whether they are Great Works or not remains to be seen.) Microsoft Word follows it. Microsoft Excel follows it, with the twist that the document is not lines of text but cells that can perform math. Microsoft Project and Microsoft Powerpoint follow the individual model, with their slight twists. Even Visual Studio, Microsoft's IDE for programmers, uses this mental model. (Not all products. Other products, such as SQL Server and their on-line games, are made for sharing data.) Even Sharepoint, the corporate intranet web/share/library system, is geared for storing documents created or edited by individuals.

The idea that Microsoft misses is collaboration. The interactive word processor is thirty years old, yet Microsoft still follows the "user as individual" concept. In the late 1970s, sharing meant exchanging floppy disks or posting files on a bulletin board. In 2009, sharing to Microsoft means sending e-mails or posting files on a Sharepoint site.

The latest product that demonstrates this is Sketchflow, a tool for analysts and user experience experts to quickly create mock-ups of applications. It's a nice idea: create a tool for non-programmers to build specifications for the development team.

Microsoft uses a Silverlight application to let a person build and edit a description. They can specify windows (or pages, if you want to think of them that way); place buttons, text boxes, and other controls on the window; and link actions to buttons and connect pages into a sequence. (It sounds a lot like Visual Studio, doesn't it? But it isn't.)

I can see business analysts and web designers using Sketchflow. It makes sense to have a tool to quickly build a wireframe and let users try it out.

Microsoft misses completely on the collaboration aspect. Each Sketchflow project is a separate thing, owned by a person, much like a document in MS Word. Sharing means sending the project (probably by e-mail) to the reviewers, who use it and then send it back with notes. That works for one or maybe two users, but once you have more reviewers the coordination work becomes untenable. (Consider sending a document to ten reviewers, receiving ten responses, and then combining all of their comments. Even with change-tracking enabled.)

There is no attempt at on-line collaboration. There is no attempt at multi-review comment reconciliation. The thinking is all "my document and your comments".

The word "collaboration" seems to be absent from Microsoft's vocabulary: a review of the Microsoft-hosted web sites that describe Sketchflow omit the word "collaboration" or any of its variants.

Its time that we take the "personal" out of "personal computer" and start thinking of collaboration. People work together, not in complete isolation. Google and Apple have taken small steps towards collaborative tools. Will Microsoft lead, follow, or at least get out of the way?