Monday, August 29, 2011

Getting old

Predicting the future of technology is difficult. Some technologies endure, others disappear quickly. Linux is twenty years old. Windows XP is ten, although bits of the Windows code base go back to Windows NT (and possibly Windows 3.1 or even MS-DOS). Yet the "CueCat", Microsoft "Bob", and IBM "TopView" all vanished in short order.

One aspect of technology is easy to predict: our systems will be a mix of old and new tech.

Technology has always been a mix of new and old. In the Eldar Days of PCs (the era of the Apple II, CP/M, and companies named Cromemco and Northstar), computer systems were blends of technology. Computers were powered by microprocessors that were low-end even in the day, to reduce the cost. We stored data on floppy disks (technology from the minicomputer age), on cassette tapes (a new twist on old, cheap hardware), and some folks used paper tape (tech from the 1960s).

Terminals were scrounged keyboards and displays, often the display was a television with 40 characters of uppercase characters per line. The better-off could afford systems with built-in equipment like the Radio Shack TRS-80 and the Commodore PET.

The software was a mix. Most systems had some form of Microsoft BASIC built into ROM; advanced systems allowed for the operating system to be loaded from disk. CP/M was popular and new to the microcomputer era, but it borrowed from DEC's operating systems. Apple had their DOS, Heathkit had their own HDOS but also supported CP/M and the UCSD p-System.

We all used BASIC. We knew it was from the timesharing era, despite Microsoft's extensions. When not using BASIC, we used assembly language and we knew it was ancient. A few brave souls ventured into Digital Research's CBASIC, FORTH, or a version of Tiny C.

The original IBM PC was a mix of off-the-shelf and new equipment. The keyboard came from the earlier IBM System/23, albeit with different labels on the keys. The motherboard was new. The operating system (MS-DOS) was new, but a clone of CP/M.

Our current modern equipment uses mixed-age technologies. Modern PCs have just now lost the old PS/2 keyboard and mouse ports (dating back to the 1987 IBM PS/2) and the serial and parallel ports (dating back to the original 1981 IBM PC with the same connectors and earlier with larger, more rugged connectors).

Apple has done a good job at moving technology forward. The iPhone and iPad devices have little in the way of legacy hardware and software. Not bad for a company whose first serious product had a built-in keyboard but needed a television to display 24 rows of 40 uppercase characters.

Sunday, August 28, 2011

A successful project requires diverse skills

My introduction to the Pentaho suite (and specifically the "Spoon" and "Kettle" tools) gave me some insight into necessary skills for a successful project.

For a successful development project, you need several, diverse skills. All are important. Without any of them, the project will suffer and possibly fail.

Assuming that your project will use some tools (compilers, web servers, whatever), you need the skills necessary to use the tools. How the they work. How they open files and read data. Which widgets are included and what they do. (Microsoft .NET has a problem with its widgets: there are too many of them. When building a solution, you must either use pre-built widgets or make your own. The .NET class collection is larger than a single person can comprehend, so you always fear that you are missing out on a simpler solution from a more powerful widget.)

Second is knowledge of the business resources. What data is available? How is the data named? Where is it stored? Most importantly, what does it mean? As with widgets, some data domains have "too much" data, such that a single person can never be sure that they have the right data. (Large organizations tend to have analysts dedicated to the storage and retrieval of data.) Beyond data, you need awareness of servers and network resources.

Third is knowledge of the business environment. What external requirements (regulations) affect your solution? What are the self-imposed requirements (policies)? What about authentication and data retention?

Fourth is the ability to interact with business folks and understand the requirements for the specific task or tasks. (What, exactly, should be on the report? What is its purpose? Who will be using it? What decisions do they make?)

Finally, we have programming skills. The notions of iteration, selection, and computation are needed to build a workable solution. You can lump in the skills of iterative development (extreme programming, agile development, or any set you like). Once you understand the tool and the data available, you must compose a solution. It's one thing to know the tools and the problem domain, it's another to assemble a solution. It is these skills that make one a programmer.

Some organizations break the development of a system into multiple tasks, assigning different pieces to different individuals. This division of labor allows for specialization and (so managers hope) efficiency. Yet it also opens the organization to other problems: you need an architect overseeing the entire system to ensure that the pieces fit, and it is easy for an organization to fall into political bickering over the different subtasks.

Regardless of your approach (one person or a team of people), you need all of these skills.

Thursday, August 25, 2011

Farewell Steve Jobs

Jobs was the last of the original titans of microcomputers. There were many folks in the early days, but only a few known by name. Steve Wozniak, Gary Kildall, and Bill Gates were the others.

Those titans (the known-by-name and the unsung) made the microcomputer revolution possible. Companies like Apple, Microsoft, Digital Research, Radio Shack, Commodore, Heathkit, and even TI and Sinclair all made personal computing possible in the late 1970s and early 1980s.

There are few titans today. Yes, we have Steve Ballmer at Microsoft and Larry Ellison at Oracle, but they are business folks, not technologists. The open source community has its set (Linus Torvalds, Eric S Raymond, and others) but the world is different. The later titans are smaller, building on the shoulders of their earlier kin.

Steve Jobs and Apple taught us some valuable lessons:

Design counts: The design of a product is important. People will pay for well-designed products (and avoid other products).

Quality counts: People will pay for quality. Apple products have been priced higher than corresponding PC products, and people buy them.

Try things and learn from mistakes: Apple tried many things. There were several incarnations of the iPod before it become popular.

One can enter an established market: Apple entered the market with its iPod well after MP3 players were established and "the norm". It also entered the market with its iPhone.

One can create new markets: The iPad was a new thing, something previously unseen. Apple made the market for it.

Drop technology when it doesn't help: Apple products have mutated over the years, losing features that most folks would say are required for backwards compatibility. AppleTalk, the PS/2-style keyboard and mouse ports, RS-232 serial ports, Centronics parallel printer ports, even Firewire have all been eliminated from the Apple line.

Use marketing to your advantage: Apple uses marketing strategically, coordinating it with products. It also uses it as a weapon, raising Apple above the level of the average technology companies.

Replace your own products: Apple constantly introduces new products to replace existing Apple products. They don't wait for someone else to challenge them; they constantly raise the bar.

Focus on the customer: Apple has focussed on the customer and their experience with the product. Their customer experience beats any product, commercial or open source.

Apple must now live without Steve Jobs. And not only Apple, but all of us. Steve Jobs' influence was not merely within Apple but extended to the entire computing world.

Monday, August 22, 2011

Immutable Object Programming

I've been working with "Immutable Object Programming" and becoming more impressed with it.

Immutable Object Programming is object-oriented programming with objects that, once created, do not change. It is a technique used in functional programming, and I borrowed it as a transition from traditional object-oriented programming to functional programming.

Immutable Object Programming (IOP) enforces a discipline on the programmer, much like structured programming enforced a discipline on programmers. With IOP, one must assemble all components of an object prior to its creation. The approach of traditional object-oriented programming allows for objects to change state, and this is not possible with IOP. With IOP, you do not want an object to change state. Instead, you want a new object, often an object of a different type. Thus, when you have new information, you construct a new object from the old, adding the information and creating a new object of a similar but different type. (For example, a Sale object and a payment are used to construct a CompletedSale object.)

IOP yields programs that have lots of classes and the logic is mostly linear. The majority of statements are assignment statements -- often creating an object, and the logic for iteration and decisions are contained within the constructor code.

As a programmer, I have a good feeling about the programs I write using IOP techniques. It is a feeling of certainty, a feeling that the code is correct. It is a good feeling.

I experienced this feeling once before, when I learned structured programming techniques. At the time, my programs were muddled and difficult to follow. With structured programming techniques, my programs became understandable.

I have not had that feeling since. I did not experience it with object-oriented programming; OOP was difficult to learn and not clarifying.

You can use immutable object programming immediately; it requires no new compiler or language. It requires a certain level of discipline, and a willingness to change. I use it with the C# language; it works with any modern language. (For this conversation, C++ is omitted from the set of modern languages.) I started with the bottom layer of our objects, the ones that are self-contained. Once the "elementary" objects were made immutable, I moved up a layer to the next set of objects. Within a few weeks I was at the highest level of objects in our code.

Monday, August 15, 2011

Iterating over a set is better than looping

When coding, I find it better to use the "foreach" iterator that the "for" loop.

The two are similar but not identical. The "for" operation is a loop for a fixed number of times; the "foreach" operation is applied to a set and repeats the contained code once for each member of the set. A "for" loop will often be used to achieve the same goal, but there is no guarantee that the number of iterations will match the size of the set. A "foreach" iteration is guaranteed to match the set.

For example, I was reviewing code with a colleague today. The code was:

for (int i = 0; i < max_size; i++)
{
for (int j = 0; j < struct_size; j++, i++)
{
item[i] = // some value
}
}

This is an unusual construct. It differs from the normal nested loop:
  • The inner loop increments both index values (i and j)
  • The inner loop contains assignments based on index i, but not j
What's happening here is that the j loop is used as a counter, and the index i is used as an index into the entire structure.

This is a fragile construct; the value max_size must contain the size of the entire structure "item". Normally the max_size would contain the number of larger elements, each element containing a struct_size number of items. Changing the size of item requires understanding (and remembering) this bit of code, since it must change (or at least the initialization of max_size must change).

Changing this code to "foreach" iterators would make it more robust. It also requires us to think about the structures involved. In the previous code, all we know is that we have "max_size" number of items. If the set is truly a linear set, then a single "foreach" is enough to initialize them. (So is a single "for" loop.) If the set actually consists of a set of items (a set within a set), then we have code that looks like:

foreach (Item_set i in larger_set)
{
foreach (Item j in i)
{
j = // some value
}
}

Of course, once you make this transformation, you often want to change the variable names. The names "i" and "j" are useful for indices, but with iterators we can use names that represent the actual structures:

foreach (Item_set item_set in larger_set)
{
foreach (Item item in item_set)
{
item = // some value
}
}

Changing from "for" to "foreach" forces us to think about the true structure of our data and align our code with that structure. It encourages us to pick meaningful names for our iteration operations. Finally, it gives us code that is more robust and resilient to change.

I think that this is a win all the way around.

Sunday, August 14, 2011

The web and recruiting

When times are hard, and lots of people are seeking employment, companies have an easy time of hiring (for those companies that are even hiring). When times are good, more people are employed and fewer people seek employment. Companies wishing to hire folks have a more difficult time, since there are fewer people "in the market". The "market" of available people swings from a "buyer's market" to a "seller's market" as people are more and less available.

The traditional method of finding people is through staffing agencies and recruiting firms. I suspect that the internet and the web will change this. Companies and candidates can use new means of advertising and broadcasting to make information available, either about open positions or available talent. From Facebook and LinkedIn to contributions to open source projects, candidates make a wealth of information available.

As I see it, companies will fall into two general categories. I call them "group A" and "group B".

Group A companies will use the web to identify candidates and reach out to them. They will use the web as a means of finding the right people. Group A companies use the web-enabled information.

Group B companies will use the web in a passive role. They will post their open positions and wait for candidates to apply. (And they will probably demand that the candidate submit their resume in Word format only. They won't accept a PDF or ODT file.) They may use web sites to check on candidates, probably looking for pictures of the person dressed as a pirate. I expect that they will do little to review contributions to open source projects.

When it comes to finding talent, the group A companies have the advantage. They can evaluate candidates early on and make offers to the people who have the skills that they seek. The group B companies have a harder time, as they have to filter through the applications and review candidates.

I suspect that, between the two strategies, the group A companies will be more effective and have the smaller effort. It's more work for each candidate, but less work overall. The group B companies will spend less time on each applicant, but more time overall. (And by spending less time on each candidate, they make less effective decisions.)

I also suspect that both groups will think that their strategy is the lesser effort and more effective.

So which company do you want to be? And which company would you rather work with?

Wednesday, August 10, 2011

A measure of quality

I propose that quality of code (source code) is inversely correlated to duplications within the code. That is, the more duplications, the worse the code. Good code will have few or no duplications.

The traditional argument against duplicate code is the increased risk of defects. When I copy and paste code, I copy not only the code but also all defects within the copied code. (Or if requirements later change and we must modify the copied code, we may miss one of the duplicate locations and make an incomplete set of changes.)

Modern languages allow for the consolidation of duplicate code. Subroutines and functions, parent classes, and code blocks as first-class entities allow the development team to eliminate the duplication of code.

So let us assume that duplicate code is bad. Is it possible to measure (or even detect) code duplications? The answer is yes. I have done it.

Is it easy to detect duplicate code? Again, the answer is yes. Most developers, after some experience with the code base, will know if there are duplicate sections of code. But is there an automated way to detect duplicate code?

And what about measuring duplicate code? Is it easy (or even possible) to create a metric of duplicate code?

Let's handle these separately.

Identifying duplicate blocks of code within a system can be viewed as a scaled-up version of the same problem between two files. Given two separate source files, how can one find the duplicate blocks of code? The method I used was to run a custom program on the two files, a program that identified common blocks of code. The program operated like 'diff', but in reverse: instead of finding differences, it found common blocks. (And in fact that is how we wrote our program. We wrote 'diff', and then changed it to output the common blocks and not the different blocks.)

Writing our 'anti-diff' utility (we called it 'common') was hard enough. Writing it in such a way that it was fast was another challenge. (You can learn about some of the techniques by looking for 'how is grep fast' articles on the web.)

Once the problem has been solved for two files, you can scale it up to all of the files in your project. But be careful! After a moment's thought, you realize that to find all of the common blocks of code, you must compare every file against every other file, and this algorithm scales O(n-squared). This is a bad factor for scaling, and we solved it by throwing hardware at the problem. (Fortunately, the algorithm is parallizable.)

After more thought, you realize that there may be common blocks within a single file, and that you need a special case (and a special utility) to detect them. You are relieved that this special case scales at O(n).

Eventually, you have a process that identifies the duplicate blocks of code within your source code.

The task of identifying duplications may be hard, but assigning a metric is open to debate. Should a block of 10 lines duplicated twice (for a total of three occurrences) count the same as a block of 15 lines duplicated once? Is the longer duplication worse? Or is the more frequent duplication the more severe?

We picked a set of "badness factors" and used them to generate reports. We didn't care too much about the specific factors, or the "quantity vs. length" problem. For us, it was more important to use a consistent set of factors, get a consistent set of metrics, and observe the overall trend. (Which went up for a while, and then levelled off and later decreased as we requested a reduction in duplicate code. Having the reports of the most serious problems was helpful in convincing the development team to address the problem.)

In the end, one must review the costs and the benefits. Was this effort of identifying duplicate code worth the cost? We like to think that it is, for four reasons:

We reduced our code base: we identified and eliminated duplicate code.

We corrected defects: We identified near-identical code and found that the near-duplicates were true duplicates, some with fixes and some without. We combined the code and ensured that all code paths had the right fixes.

We demonstrated an interest in the quality of the code: Rather than focus on only the behavior of the code, we took an active interest in the quality of our source code.

We obtained a leading indicator of quality: Regression tests are lagging indicators of quality, observable only after the coding is complete. We can measure duplicate code from the source code, and from the first day of the project, getting measurements immediately.

We believe that we get the behavior that we reward. By imposing soft penalties for duplicate code, measuring the code, and distributing that information, we changed the behavior of the development team and improved the quality of our code. We made it easy to eliminate the duplicate code, by providing lists of the duplicate code and the locations within the code base.

Sunday, August 7, 2011

My next PC won't be real

After working a bit with Eclipse and the Android SDK, I have come to the realization that my PC is a bit ... lame ... and needs to be replaced. The PC is an old Dell Optiplex that I found in the "giving place" in my apartment building. Someone else was throwing it away, and I adopted it. I replaced the disc and pushed the memory to the limit. It served me well for the past few years, but it cannot handle the development tools effectively.

The question is: what new PC do I select? Should it be a PC? Or a Mac? Or a tablet? I'm thinking that the new PC will be none of these.

Perhaps my new PC will be a virtual PC in a cloud. Instead of buying a physical PC and installing it at home, I can rent a virtual PC on a cloud service (for example, amazon.com's EC2, or VMware's cloud). I don't need the PC -- all I need is the processing power to drive Eclipse and the Android SDK.

Think about it... I don't need the PC box sitting in my room. I don't need the cables and wires. All I need is processing power... and I don't really care about the location of the processor. With the internet, I can run my applications anywhere.

I *do* need a way to talk to the virtual PC. I need a mechanism to control Eclipse and see my files. (Really, I want a tablet app that lets me talk to Eclipse in the cloud.)

In the eldar days (before the first IBM PC), the only folks using microcomputers were determined hobbyists and a few really determined business users. The latter thought that they wanted computers, but all they really wanted was the business applications.

Today, people use computers but only from inertia, and Apple has shown another path. People want processing power, not computers. They don't care about CPUs or memory... all they want is for their apps to work.

Current cloud apps are offering a glimpse of the future of apps. Phones and tablets have made the purchase and installation of apps easy and inexpensive. GMail and Google Docs have made apps available wherever there is an internet connection.

Why would anyone go through the trouble of buying and installing a physical PC?

The current cloud offerings are built around servers. Amazon.com EC2 is designed for servers (build your own, but talk to it like a server). Google App Engine is built to run web apps.

The next wave will be personal computers in the cloud. Computers that can run plain apps and talk to you in a VNC client, or maybe even a browser.

The following wave will be personal apps in the cloud... and apps will mutate to live in the cloud. The app will have two parts: a small client on your phone or tablet and a back end on the server in the cloud.

Tuesday, August 2, 2011

Personal or impersonal?

There are two ways to design software: impersonal and personal. Interestingly, the same two methods apply to businesses.

Consider airline travel and the instructions given to passengers. Anyone who has travelled on a commercial flight knows these instructions (how to fasten and wear a seatbelt, floatation devices, oxygen masks, and no smoking in lavatories). Airlines are mandated to provide these instructions to passengers.

Many airlines view this task as a cost, and do everything that they can to minimize that cost. The provide the instructions on recorded video, which reduces the cost and ensures a consistent delivery of the message. In doing so, they make the procedure impersonal. Some might argue that it is inhuman.

Southwest Airlines takes a different approach. On all of their flights, the flight attendants provide the message. There is no recording, no impersonal message. People deliver the message. Some crews take the opportunity to customize the message, adding humorous comments to the instructions. The experience on Southwest is more enjoyable.

The same concept applies to software. Now, I don't expect you to visit each of your customers and provide humorous instructions for the use of your software. I will ask you to think about the experience that you provide to your customers. Is it consistent and impersonal?

The change from desktop PC to laptop saw no real change in the user experience. The change from laptop PC to smart phone (or tablet) is larger; customers have a more intimate relationship with these devices. Impersonal applications will find little traction in the smartphone and tablet market.

You can make the customer experience what you want it to be. The question is, what do you want it to be?