Tuesday, February 9, 2010

Error messages as metric

Here's an idea for evaluating an organization: measure their error messages.

I picked the word "measure" in that sentence to let you define the appropriate metric. It could be a raw count or the complexity of messages. Or the absence of them. (An organization with lots of error messages in its code says one thing. An organization with little or no messages says something else.)

This task is perhaps more thought experiment than feasible. I suspect that most organizations are ill-prepared to simply hand over their error messages, even if they so desired. Error messages get tucked away in the darndest places, and any large system (comprising of multiple programs) will have messages in different forms. Messages can be hard-coded, stored in resource files, read from external text files, and generated on-the-fly.

Yet an evaluation of error messages may be of value. Error messages are presented to the user, and are a form of communication. I suspect that they are less regulated by the marketing arms of organizations than other forms of communication, such as web pages and e-mail updates. They are sent to the user only when something goes wrong (usually the fault of the user, but not always). They are not in the forefront of marketing.

Here are the aspects that I would look at:

- Are the messages accurate? Do they present the correct information for the situation?
- Are they spelled correctly? Do they have correct grammar?
- Are they specific? Do they present details, or do they present general text such as "Required field missing"?
- Do they recommend an action? Or do they assume that the user will know what to do?
- Are there lots of them? Too many, perhaps?
- Are there too few?
- Are there messages for situations that can never occur (much like "dead source code")?

Error messages may tell us a lot about the organization's view of its relationship to its users. The home page will have pleasant (or impressive) descriptions and pictures of smiling people. But anyone can have a welcoming or impressive home page.

Show me the messages!


Friday, February 5, 2010

Convergence in the cloud

The old convergence of PCs and TVs is underway. While WebTV was not the next big thing, YouTube and NetFlix have shown that the two technologies can work together. From here on out, the TV/PC convergence is a "done deal". On to the next convergence!

I think the next convergence will be in the computing realm.

We have multiple models of processing: stand-alone applications, client/server, web apps, smartphone apps, and (coming soon to a theater near you) cloud apps. These "platforms" are a varied lot, with different UI capabilities and different strengths.

The converged system will blend the capabilities of those different platforms and allow apps to move (or be exposed) at different levels. Some apps will work on a few levels, others will move to all levels.

This idea is not new. Once accounting systems were established on mainframes, but other levels of computing built bridges and inroads to extract data. Some of the solutions were hacked together from extract tapes, 3740 floppy discs, and report writer output, but they were useful. The advantages of sharing data, sharing information, are too great to leave applications at any single level.

I see the converged platform being cloud-based, with virtual processors and languages that use virtual machines, and interfaces that vary as they move from level to level. The "local PC" will be a cloud host, running a scaled-down version of the application. A plain application will run in the cloud -- perhaps your private cloud, but a cloud. 

Your smartphone will run its own cloud (or perhaps talk to a cloud) for its processing.

The notion of a plain executable will go away. There will be a generic processor, one that is present everywhere, delivered through processor virtualization.

The dominant languages will be those that can live in the cloud. LISP, Ruby, and possible C# will be the popular language choices. Elder languages such as COBOL and FORTRAN will be supported in emulators, interpreters, and translators. (Perhaps a just-in-time translator from COBOL to Java, and then an emulator for Java on the virtual processor.)

User interfaces will move further away from core processing. An application will talk to an interface or multiple interfaces. The interface will handle the specific device; your e-mail program (if we still use e-mail) will run on your cell phone, your tablet, your desktop, and your mainframe, using a "virtual interface" to present information.

Applications will be able to float and move from level to level. You can start a program on your desktop PC, transfer it to your cell phone as you commute to the office, and then use your tablet PC in a park at lunch.

We won't care about the processor (sorry, Intel!) or the supporting operating system (sorry, Microsoft!).

We *will* care about the brand and type of cloud. I expect that there will be multiple cloud vendors, with not necessarily compatible offerings. Microsoft is working on "Azure", Google has its "App Engine", and Amazon has its offerings. Look for Oracle to announce something soon. (And they had better be working on it, or they will be reduced to a minor player.) Also look for clouds from overseas. I expect China to create one, and another from a European consortium. India and Brazil may create clouds of their own.

We can look to clouds of different sizes. Clouds will be available for the general public, national governments (and state and local governments), corporations,  and individuals. You can run a cloud on your PC and talk to your account in the public cloud at the same time.

Your choice of cloud will define your business capabilities, just as your choice of operating system today dictates your capabilities.

Clouds will be hacked and attacked, just as computers are today. But in the future, once we find an attacker, we will be able to say:

"Hey, you! Get off of my cloud!"


Sunday, January 31, 2010

How much code is too much code?

As an industry, we have over half a century of experience. We should, by now, be in a position to use that experience to decide on "good" and "bad" software. One dimension is size. Let's explore that aspect of software.

Certainly, software is needed. Lines of code are required to get the work done. Without code, nothing happens.

Yet too much code is a bad thing. Smaller programs are easier to understand than larger programs. And easier to modify. And to fix. "A simple program has obviously no defects. A complex program has no obvious defects."

So how do we reduce the size of our code? The traditional methods are: eliminate duplicate code, re-factor to combine similar sections, and re-write to use better coding techniques. These are all laudable, but insufficient.

Here's what you need to do:

First, decide that you care about the size of your source code. In my experience, most project managers (and most programmers) care little if at all for the size of the source code. Savvy managers realize that there is a correlation between the size of source code and the quality of the result. They also realize that size has an effect on their team's ability to meet new requirements. But most managers care only for the end result of delivery on time, with as few defects as possible, and ignore the size of the code.

Second, decide that you want to measure your source code. This implies wanting to monitor the code size and take action as you get the measurements. Many project plans are formed up front, with no slack for adjustments. If you're not going to change the project as you get the information, then why bother collecting it?

Third, decide on the measurement frequency. You must measure frequently enough to make a difference, but not so frequently that you waste resources. The measurement must feed into your OODA (Observe-Orient-Decide-Act) loop. Measuring once per year is probably too infrequent. Measuring every day is probably to frequent.

Fourth, pick a measurement. There are different ways to measure source code, each has its advantages. Here are a few:

- Lines Of Code (LOC): a raw count of the lines of source code, easily obtained with tools like wc. Advantages: it's easy to do -- very easy. Disadvantages: it doesn't account for blank lines or comments, nor does it measure complexity. (I can write a complex algorithm in 20 lines of code or 200 lines, but which is better? The short one may be hard to understand. Or the long one may be inefficient.)

- Source Lines Of Code (SLOC): a count of just the source code, omitting blank lines and comments. Advantage: a more accurate measure that LOC. Disadvantage: harder to do -- you need filters before sending source into wc.

- Function points: a measure of complexity of the task, not the code. Advantage: better for comparing different projects (especially those that use disparate technologies). Disadvantage: much, much harder to compute. Perhaps so much harder that the effort to derive these numbers costs more than the benefits.

- Complexity measure (McCabe or others): a measure of the complexity in the code. Advantages: good for identifying complex areas of code, and comparing different projects. Disadvantages: hard to derive.

Fifth, decide on your goal. A capricious goal of "reduce code to half its current size" is foolish. How do you know your code is too big? (Yes, it probably is larger than it needs to be, based on what I've seen with various software efforts. But how do you know that a reduction to half its current size is wise, or even possible?) A different goal, and one that may be more useful, is to improve understanding. Here are some ideas:

- For projects that use multiple languages, understand the relative size of the language code bases. How much of your project is in C++? How much in C#? And how much in Visual Basic?

- For projects with components or libraries, understand the relative size of the different components. How does the size of source code compare with the size of the development teams?

- One measurement you can take (with enough effort) is the rate of change -- not just an absolute size. Identifying the code that changes most frequently lets you identify the "hot spots" of your code. These areas may be at the most risk for defects.

I haven't answered the question of "how much is too much". There is no one answer. But with measurements, you may be able to answer the question for yourself.


Wednesday, January 27, 2010

iPad: an off-by-one error

Apple released the iPad today, to responses that varied from "hooray" to "yawn". The cheers came from the Apple fans, who cheer any announcement from Cupertino. The yawns came from the rest of us, who may be wondering about Apple's strategies.

Here's how I see the iPad: It's Apple's response to the netbook market. Netbook computers have gained popularity over the past few years, and the lack of a netbook class offering in Apple's products in obvious. Rather than produce a smaller MacBook, Apple has chosen the route of a computer without a keyboard. Which is all very good, but also so very 2008.

The yawns from the audience are caused by Apple's focus on netbooks, which are now the old thing. Netbooks are here and part of the ecosystem. The new thing are e-books and e-book readers, like the Kindle and the Nook. (I still think that those are the silliest names on the planet -- but that's not the point.)

Apple addressed last year's hot item, not the current hot item. Yeah, they have a nice-looking tablet computer that can display e-books, and they have an agreement with some publishers. But they are not competing in the market.


Sunday, January 24, 2010

Linux updates vs. Windows updates

This past week, Mozilla released a new version of the Firefox web browser. (Version 3.6, for those who are interested.)

Along with the announcement, various pundits and commentators recommended that we update our Firefox to the new version.

For a while, I had to think about that recommendation. Not because I disagree with it (I believe in staying current with all software, including browsers, compilers, and operating systems) but because I didn't understand it. Why was it necessary to make such a recommendation?

After some pondering, and then some use of my computers at home (one of which runs Windows), I understood.

Microsoft Windows has a built-in update system, but it works only with Microsoft products. (And not necessarily all Microsoft products. It won't update you if the new version requires a new license.)

I've been living in the Linux world. The Linux distros all have update mechanisms -- at least the distros that I have been using, which include SuSE, Ubuntu and its variants, and Debian. The update mechanisms update all of the installed software. If I've selected Firefox, the distros (SuSE, Ubuntu, and Debian) will all install a new version for me. I don't have to do anything -- except run the updates. (And they kick in automatically, but politely ask permission to install the new versions.)

Side note: The different distros run on their own schedules. Some distros send updates immediately, other take a few weeks. Ubuntu is fairly fast, SuSE is a bit slower, and Debian is slower still. So I don't have the latest and greatest versions of everything immediately. But they do get there, eventually.

In the Windows world, the Microsoft updates handle Microsoft products, but beyond that you are on your own. If you want a new version of a non-Microsoft product, the Microsoft update does not help you.

(I have heard of a Linux-like update for Microsoft Windows, one that installs the latest version of open source products, but I have been unable to find it.)

So now I understand. With Linux, I get the latest version, "for free" since I need take no special action. With Windows, I get the latest version of Microsoft software (except when it costs money) and for other software I am on my own.


Sunday, January 17, 2010

Version control is not about versions

We call it version control, but it's really not about version control.

In the distant past, we lived without version control. (Or automated tests. Or e-mail.)

While programs were written by single individuals, things mostly worked. But even a single individual can make mistakes, and a previous version of the program is helpful. The ability to "go back in time" is useful, and a copy of yesterday's version (or last week's, or last year's) is easy enough to keep.

Once program teams became larger than a single person, we needed ways to coordinate the activities of the different team members. It was all too easy to store the code in a common place, let everyone make changes, and have interleaving reads and writes that lost changes.

Thus version control was invented. Version control did two things: it kept previous versions of the code (providing the "back in time" capability for everyone) and it serialized access to files, so all changes were kept.

Modern, professional projects today use version control of some sort. (SourceSafe, CVS, subversion, Git, ... the list is a long one.) People think that it is a necessity (which it is) but don't understand the real reason for selecting a version control system.

Most people think that version control systems (VCSs) are about versions. They focus on the "back in time" capability of a VCS.

But VCS is about more than "back in time".

Your version control system defines how the team works. It sets the rules for interactions between team members. If you select a "locking" VCS (or configure one that way), you define a process that forces team members to wait for files to become "unlocked" to check in their changes. If you define a process that does not lock files, then you force team members to merge changes with concurrent changes.

The VCS can grant and restrict access to different branches of the storage "tree". You can use this feature, or you can allow all team members access to every area. In the latter case, you trust team members to make the right changes. In the former case, you don't trust your team members.

Here's the pattern that I've seen: shops that use agile methods (daily builds and  automated tests, specifically) put less emphasis on their VCS. They let all team members check in files anywhere. They expect team members to run tests prior to check in. They expect their team members to do the right things. They trust their teammates.

Teams that use waterfall methods (infrequent builds, limited or no automated tests) use their VCS as a control mechanism. They restrict access to certain areas, to let only the select trusted few make changes in those areas. They expect their team members to do the wrong things, and use their VCS to guard against mistakes. They do not trust their teammates.

I'm not saying that one of these is better than the other. (Although in my personal experience I prefer the former.) Both have their advantages. Which you choose depends on your management style and your trust in your people.

Version control is about the team, how it interacts, and how much it trusts itself. And how much the managers trust the team.


Sunday, January 10, 2010

Microsoft steps forward to 1970

I recently came across Microsoft's white paper for spreadsheet compliance. It has the impressive title "Spreadsheet Compliance in the 2007 Microsoft Office System".

Published in 2006, it describes a set of techniques that can be used to ensure the storage and management of spreadsheets comply with regulations.

The document is clearly aimed at corporate managers. It begins with an executive summary, describing the risks of poorly managed spreadsheets and the legal ramifications of noncompliance with regulations.

The good news is that Microsoft recognizes that spreadsheets are important to organizations, contain critical data, and deserve proper design and management. (And if Microsoft recognizes that, then other companies are likely to follow.)

The bad news is that Microsoft recommends the use of waterfall methodology for spreadsheet development. The steps are: define requirements, design, implement, test and verify, and deploy. Microsoft includes a footnote that indicates this process for spreadsheets was based on the waterfall model for software development. (Or perhaps Microsoft does not recommend this model. The text reads "Here is a recommended development approach to creating spreadsheets", in the passive voice. Microsoft, aside from publishing this white paper, makes no direct recommendation for this method. But their white paper recommends it, so Microsoft recommends it. And neither the paper nor Microsoft present alternatives.)

My first thought on this recommendation was "Does Microsoft really believe that design-up-front waterfall methods are good ways to design spreadsheets?" I find it hard to believe, since Microsoft must be using agile techniques for their products -- or at least some of them. The waterfall model -- specifically the big-design-up-front, everything-in-one-cycle approach is simplistic and naive. The plans look good on paper, and it promises delivery on a specific date, but for anything beyond trivial projects it fails.

Perhaps Microsoft doesn't believe that the waterfall method is appropriate, but instead thinks that their customers believe it to be appropriate. I find this more reasonable; many companies -- especially big companies -- use waterfall for their development cycle. So Microsoft recommends not what it thinks best, but what it thinks will sell.

Or perhaps this development method is not important. The white paper goes on to explain the features in various Microsoft products that can help companies manage their spreadsheets. I specifically say "companies" because the products are enterprise solutions, not suitable for individuals or small companies. In this case, the white paper is not a solution for problems, but marketing material.

In any case, the impression of Microsoft is not flattering.