Monday, March 31, 2014

Microsoft Azure may be the new Windows

For the past two decades Microsoft has used the Windows platform to build its empire. Microsoft delivered a capable combination of operating system and applications. Microsoft's applications ran on Windows (and only Windows) and used proprietary formats. The combination gave Microsoft a near-stranglehold on the market.

The world is changing. Perhaps it is time for Microsoft to move on to something new. Here's why:

File formats The formats for Microsoft applications are open and documented. (Due to court decisions.) Proprietary formats worked to Microsoft's advantage. Now non-Microsoft applications can read and write files which can be exchanged with Microsoft applications.

Other operating systems for personal computers Mac OS and Linux are capable operating systems. Both have a significant number of applications. One can run a home or a small office with Windows, Mac OS, or Linux.

Competing applications The Microsoft Office suite is no longer the only game in town. Competing applications handle word processing, spreadsheets, presentations, e-mail, and even project management.

The Web Applications are moving from PC desktops to the browser, a trend that may have been started by Microsoft itself, with its web version of Outlook.

Phones and tablets Mobile devices offer a new vision of computing, one that entails less administration.

I think that Microsoft has looked at these changes and decided that Windows is not the way forward. I think that Windows, while still an important part of Microsoft's offerings, is no longer the center of its world.

Microsoft's re-branding of "Windows Azure" as "Microsoft Azure" is telling. The cloud computing platform supports more than Windows, and more than just Microsoft's Windows-centric languages.

Windows is an old operating system. It carries a lot of baggage, code to ensure compatibility with previous versions. While Linux and Mac OS are based on the older Unix, Windows has seen more changes as Microsoft added features and fixed defects. It may be that previous design decisions, the accumulated baggage of two decades, are limiting the ability of Windows to rise to new challenges.

My guess is that Microsoft may de-emphasize Windows and focus on subscriptions such as Office 365 and the web version of Visual Studio. Such a change would correspond to a move from the PC platform to a cloud platform. Instead of Windows, Microsoft will sell its Azure platform.

The knowledgeable reader will point out that Azure is built on Windows, so Windows is still part of the system. This is true -- for now. I expect Microsoft to replace Azure's "core" of Windows with an operating system better suited to servers and cloud processing, just as it replaced the early Windows "core" of MS-DOS. Windows was, in its early incarnations, a DOS application. Microsoft expanded it into a full operating system, one that surpassed MS-DOS.

I think Microsoft can do the same with Azure. Initially a system built on Windows, it can become larger than Windows, a better operating system for cloud computing, and more capable than Windows.

Windows made sense when people installed software on their personal computers. Today, people buy apps and the installation is automatic. The world is ready for a successor to Windows, and I think Azure can be that successor.

Sunday, March 30, 2014

How to untangle code: Start at the bottom

Messy code is cheap to make and expensive to maintain. Clean code is not so cheap to create but much less expensive to maintain. If you can start with clean code and keep the code clean, you're in a good position. If you have messy code, you can reduce your maintenance costs by improving your code.

But where to begin? The question is difficult to answer, especially on a large code base. Some ideas are:
  • Re-write the entire code
  • Re-write logical sections of code (vertical slices)
  • Re-write layers of code (horizontal slices)
  • Make small improvements everywhere
All of these ideas have merit -- and risk. For very small code sets, a complete re-write is possible. For a system larger than "small", though, a re-write entails a lot of risk.

Slicing the system (either vertically or horizontally) has the appeal of independent teams. The idea is to assign a number of teams to the project, with each project working on an independent section of code. Since the code sets are independent, the teams can work independently. This is an appealing idea but not always practical. It is rare that a system is composed of independent systems. More often, the system is composed of several mutually-dependent systems, and adjustments to any one sub-system will ripple throughout the code.

One can make small improvements everywhere, but this has its limits. The improvements tend to be narrow in scope and systems often need high-level revisions.

Experience has taught me that improvements must start at the "bottom" of the code and work upwards. Improvements at the bottom layer can be made with minimal changes to higher layers. Note that there are some changes to higher layers -- in most systems there are some affects that ripple "upwards". Once the bottom layer is "clean", one can move upwards to improve the next-higher level.

How to identify the bottom layer? In object-oriented code, the process is easy: classes that can stand alone are the bottom layer. Object-oriented code consists of different classes, and some (usually most) classes depend on other classes. (A "car system" depends on various subsystems: "drive train", "suspension", "electrical", etc., and those subsystems in turn depend on smaller components.)

No matter how complex the hierarchy, there is a bottom layer. Some classes are simple enough that they do not include other classes. (At least not other classes that you maintain. They may contain framework-provided classes such as strings and lists and database connections.)

These bottom classes are where I start. I make improvements to these classes, often making them immutable (so they can hold state but they cannot change state). I change their public methods to use consistent names. I simplify their code. When these "bottom" classes are complex (when they hold many member variables) I split them into multiple classes.

The result is a set of simpler, cleaner code that is reliable and readable.

Most of these changes affect the other parts of the system. I make changes gradually, introducing one or two and then re-building the system and fixing broken code. I create unit tests for the revised classes. I share changes with other members of the team and ask for their input.

I don't stop with just these "bottom" classes. Once cleaned, I move up to the next level of code: the classes than depend only on framework and the newly-cleaned classes. With a solid base of clean code below, one can improve the next layer of classes. The improvements are the same: make classes immutable, use consistent names for functions and variables, and split complex classes into smaller classes.

Using this technique, one works from the bottom of the code to the top, cleaning all of the code and ensuring that the entire system is maintainable.

This method is not without drawbacks. Sometimes there are cyclic dependencies between classes and there is no clear "bottom" class. (Good judgement and re-factoring can usually resolve that issue.) The largest challenge is not technical but political -- large code bases with large development teams often have developers with egos, developers who think that they own part of the code. They are often reluctant to give up control of "their" code. This is a management issue, and much has been written on "egoless programming".

Despite the difficulties, this method works. It is the only method that I have found to work. The other approaches too often run into the problem of doing too much at once. The "bottom up" method allows for small, gradual changes. It reduces risk, but cannot eliminate it. It lets the team work at a measured pace, and lets the team measure their progress (how many classes cleaned).

Monday, March 24, 2014

Software is not always soft

Tim O'Reilly asks "Can hardware really change as fast as software":




We're used to the idea that software changes faster than hardware. It's widely accepted as common knowledge. ("Of course software changes faster than hardware! Software is soft and hardware is hard!")

Yet it's not that simple.

Software is easy to change... sometimes. There are times when software is easier to change, and there are times when software is harder to change.

Tim O'Reilly's tweet (and the referenced article) consider software in the context of cell phones. While cell phones have been changing over time, the apps for phones tend to "rev" faster. But consider the software used on PCs. Sometimes PC software changes at a rate much slower than the hardware. The "Windows XP problem" is an example: people stay with Windows XP because their software runs on Windows XP (and not later versions of Windows).

Long-term software is not limited to PCs. Corporations and governments have large systems built with mainframe technology (COBOL, batch processing) and these systems have outlasted several generations of mainframe hardware. These systems are resistant to change and do not easily translate to out current technology set of virtualized servers and cloud computing.

What makes some software easy to change and other software hard? In my view, the answer is not in the software, but in the culture and processes of the organization. "Hard" software is a result, not a cause.

Development teams that use automated testing and that refactor code frequently have a better chance of building "soft" software -- software that is easy to change. Tests keep the developers "honest" and alert them to problems. Automated tests are cheap to run and therefore run frequently, giving developers immediate feedback. Comprehensive automated tests are cheap to run and, well, comprehensive, so developers get complete feedback and alerted to any deviations from requirements.

Refactoring is important; it allows developers to improve the code over time. We rarely get code right the first time. Often we are happy that it works, and we don't care about the simplicity or the consistency of the code. Refactoring lets us re-visit that code and make it simpler and consistent -- both of which make it easier to understand and change.

Development teams that use manual methods of testing (or no testing!) have little chance at building "soft" software. Without automated tests, the risk of introducing a defect while making a change is high. Developers and managers will both avoid unnecessary changes and will consider refactoring to be an unnecessary change. The result is that code is developed but never simplified or made consistent. The code remains hard to read and difficult to change.

If you want software to be soft -- to be easy to change -- then I encourage automated testing. I see no way to get "soft" software without it.

On the other hand, if you want "hard" software -- software that is resistant to change -- then skip the automated testing. Build a culture that avoids improvements to the code and allows only those changes that are necessary to meet new requirements.

But please don't complain about the difficulty of changes.

Sunday, March 23, 2014

How to untangle code: Limit functions to void or const

Over time, software can become messy. Systems that start with clean and readable code often degrade with hastily-made changes to code that is hard to understand. I untangle that code, restoring it to a state that is easy to understand.

One technique I use with object-oriented code is to limit member functions to 'void' or 'const'. That is, a function may change the state of an object or it may report a value contained in the object, but it cannot do both.

Dividing functions into two types - mutation functions and reporter functions - reduces the logic which modifies the object. Isolating the changes of state is a good way to simplify the code. Most difficulties with code occur with changes of state. (Functional programming avoids this problem by using immutable objects which can never change once they are initialized.)

Separating the logic to change an object from the logic to report on an object's state also frees the use of reporting functions. A combination function, one that reports a value and also changes its state can be called only when a change to the state is desired. But a 'const' function can be called at any time, and any number of times, on the object because it does not change the object's state. Thus you can refactor the code that calls 'const's functions and change the sequence of their calls (and the frequency of the calls) with confidence.

Here's a simple example. The 'combined' function:

double foo::update_totals ( double new_value )
{
    my_total += new_value;
    return my_total;
}

can be separated into two:

void foo::update_totals ( double new_value )
{
    my_total += new_value;
}

double foo::total ( void ) const
{
    return my_total;
}

These two functions perform the same operations as the single combined function, but now you are free to call the second function (total()) as many times as you like. Notice that  total() is marked as const. It cannot change the state of the object.

Your original calling code also changes:

{
    foo my_foo;

    // some code

    double total = my_foo.update_totals( 100.0 );
}

becomes:

{
    foo my_foo;

    // some code

    my_foo.update_totals( 100.0 );
    double total = my_foo.total();
}

An additional benefit of separating mutation logic from reporting logic is that functions are smaller. Smaller functions are easier to comprehend. (Yes, the calling function is slightly longer, but the reduction "gains" in the class outweigh the increase in the calling classes.)

Messy code is ... messy. You can make it less messy by separating mutation functions from reporting functions, and ensuring that all functions are either one or the other.

Wednesday, March 19, 2014

The fecundity of programming languages

Some programming languages are more rigorous than others. Some programming languages are said to be more beautiful than others. Some programming languages are more popular than others.

And some programming languages are more prolific than others, in the sense that they are the basis for new programming languages.

Algol, for example, influenced the development of Pascal and C, which in turn influenced Java, C# and many others.

FORTRAN influenced BASIC, which in turn gave us CBASIC, Visual Basic, and True Basic.

The Unix shell lead to Awk and Perl, which influenced Python and Ruby.

But COBOL has had little influence on languages. Yes, it has been revised, including an object-oriented version. Yes, it guided the PL/I and ABAP languages. But outside of those business-specific languages, COBOL has had almost no effect on programming languages.

Why?

I'm not certain, but I have two ideas: COBOL was as early language, and it designed for commercial uses.

COBOL is one of the earliest languages, dating back to the 1950s. Other languages of the time include FORTRAN and LISP (and oodles of forgotten languages like A-0 and FLOWMATIC). We had no experience with programming languages. We didn't know what worked and what didn't work. We didn't know which language features were useful to programmers. Since we didn't know, we had to guess.

For a near-blind guess, COBOL was pretty good. It has been useful in close to its original form for decades, a shark in the evolution of programming languages.

The other reason we didn't use COBOL to create other languages is that it was commercial. It was designed for business transactions. While it ran on general-purpose computers, COBOL was specific to the financial applications, and the people who would tinker and build new languages were working in other fields and with computers other than business mainframes.

The tinkerers were using minicomputers (and later, microcomputers). These were not in the financial setting but in universities, where people were more willing to explore new languages. Minicomputers from DEC were often equipped with FORTRAN and BASIC. Unix computers were equipped with C. Microcomputers often came with BASIC baked in, because it was easier for individuals to use.

COBOL's success in the financial sector may have doomed it to stagnancy. Corporations (especially banks and insurance companies) lean conservative with technology and programming; they prefer to focus on profits and not research.

I see a similar future for SQL. As a data descriptions and access language, it does an excellent job. But it is very specific and cannot be used outside of that domain. The up-and-coming NoSQL databases avoid SQL in part, I think, because the SQL language is tied to relational algebra and structured data. I see no languages (well, no popular languages) derived from SQL.

I think the languages that will influence or generate new languages will be those which are currently popular, easily learned, and easily used. They must be available to the tinkerers of today; those tinkerers will be writing the languages of the future. Tinkerers have limited resources, so less expensive languages have an advantage. Tinkerers are also a finicky bunch, with only a few willing to work with ornery products (or languages).

Considering those factors, I think that future languages will come from a set of languages in use today. That set includes C, C#, Java, Python, and JavaScript. I omit a number of candidates, including Perl, C++, and possibly your favorite language. (I consider Perl and C++ difficult languages; tinkerers will move to easier languages. I would like to include FORTH in the list, but it too is a difficult language.)

Monday, March 17, 2014

Mobile changes how we think about computing

The rise of mobile computers, while not a revolution, does introduce a significant change in our thinking of computing. I believe that this change generates angst for many.

The history of computing has seen three major waves of technology. Each of these waves has had a specific mindset, a specific way that we view computing.

Mainframes The first wave of computing was the mainframe era. Computers were large, expensive, magical boxes that were contained in sealed rooms (temples?) and attended by technicians (priests?). The main tasks of computers was to calculate numbers for the company (or government) and most jobs were either accounting or specific mathematical calculations (think "ballistics tables").

Minicomputers The second wave of computing was the minicomputer era. Computers were the size of refrigerators or washing machines and could be purchased by departments within a company (or a university). They did not need a sealed room with special air conditioning, although they were usually stored in locked rooms to prevent someone from wheeling them away. The main tasks were still corporate accounting, inventory management, order processing, and specific mathematical calculations.

Personal computers The third wave of computing saw a major shift in our mindset of computing. Personal computers could be purchased (and run) by individuals. They could be used at home or in the office (if you carried it in yourself). The mindset for personal computing was very different from the corporate-centered computing of the previous eras. Personal computing could be used for ... anything. The composition and printing of documents was handled by word processors. Spreadsheets let us calculate our own budgets. Small databases (and later larger databases) let us store our own transaction data. If off-the-shelf software was not suitable to the task, we could write our own programs.

The mindset of personal computing has been with us for over thirty years. The size and shape of personal computers has been roughly the same: the same CPU box, the same keyboard, the same monitor. We know what a PC looks like, The software has seen one major change, from DOS to Windows, but Windows has been with us for the past twenty years. We know what programs look like.

The introduction of tablets has caused us to re-think our ideas of computing. And we're not that good at re-thinking. We see tablets and phones and they seem strange to us. The size and shape are different (and therefore "wrong"); the user interface is different (and therefore "wrong"); the way we purchase applications is different (and therefore "wrong"); even the way we call applications ("apps") is different (and therefore... you get the idea).

I observe that mobile devices caused little discomfort while they remained in the consumer market. Phones that could play music and games were not a problem. Tablets that let one scroll through Facebook or read books were not a problem. These were extensions to our existing technology.

Now phones and tablets are moving into the commercial sphere, and their application is not obvious. It is clear that they are not personal computers -- their size and shape prove that. But there are more differences that cause uncertainty.

Touch interface The user interface for phones and tablets is not about keyboards and mice but about taps and swipes.

Small screen Tablets have small-ish screens, and phones have tiny screens. How can anyone work on those?

Different operating systems Personal computers run Windows (except for a few in the marketing groups that use Mac OS). Tablets run something called "Android" or something else called "iOS".

Something other than Microsoft Microsoft's entries in the phone and tablet market are not the market leaders and their operating systems have not been accepted widely.

Even Microsoft isn't Microsoft-ish Microsoft's operating system for phones and tablets isn't really Windows, it is this thing called "Windows 8". The user interface looks completely different. Windows RT doesn't run "classic" Windows programs at all (except for Microsoft Office).

The changes coming from mobile are only one front; changes to the PC are also coming.

The typical PC is shrinking Display screens have become flat. The CPU box is shrinking, losing the space for expansion cards and empty disk bays. Apple's Mac Mini, Intel's New Unit of Computing, and other devices are changing how we look at computers.

Windows is changing Windows 8 is very different from "good old Windows". (My view is that Windows 8's tiles are simply a bigger, better "Start" menu, but many disagree.)

These changes mean that one cannot stay put with Windows. You either advance into the mobile world or you advance into the new Windows world.

The brave new worlds of mobile  and Windows look and feel very different from the old world of computing. Many of our familiar techniques are replaced with something new (and strange).

We thought we knew what computers were and what computing was. Mobile changes those ideas. After thirty years of a (roughly) constant notion of personal computing, many people are not ready for a change.

I suspect that the people who are hardest hit by the changes of mobile are those aged 25 to 45; old enough to know PCs quite well but not old enough to remember the pre-PC days. This group never had to go through a significant change in technology. Their world is changing and few are prepared for the shock.

The under-25 crowd will be fine with tablets and computers. It's what they know and want.

Interestingly, the over-45 folks will probably weather the change. They have already experienced a change in computing, either from mainframes or minicomputers to personal computers, or from nothing to personal computers.

Sunday, March 16, 2014

How to untangle code: make member variables private

Tangled code is hard to read, hard to understand, and hard to change. (Well, hard to change and get right. Tangle code is easy to change and get the change wrong.)

I use a number of techniques to untangle code. Once untangled, code is easy to read, easy to understand, and easy to change (correctly).

One technique I use is to make member variables of classes private. In object-oriented programming languages, member variables can be marked as public, protected, or private. Public variables are open to all, and any part of the code can change them. Private variables are walled off from the rest of the code; only the "owning" object can make changes. (Protected variables live in an in-between state, available to some objects but not all. I tend to avoid the "protected" option.)

The benefits of private variables are significant. With public variables, an object's member variables are available to all parts of the code. It is the equivalent of leaving all of your belongings on your front lawn; anyone passing by can take things, leave new things, or just re-arrange your stuff. Private variables, in contrast, are not available to other parts of the code. Only the object that contains the variable can modify it. (The technical term for this isolation is "encapsulation".)

But one cannot simply change the designation of member variables from "public" to "private". Doing so often breaks existing code, because sections of the code have been built with the ability to access those variables.

The process of "privatizing" member variables is slow and gradual. I start with an general survey of the code and then select one class and change its member variables to private. The class I select is not picked at random. I pick a class that is "elementary", one that has no other dependencies. These "elementary" classes are easier to "privatize", since they can be modified without reliance on other classes. They also tend to be simpler and smaller than most classes in the system.

But while they do not depend on other classes their changes may affect other parts of the code. These low-level "elementary" classes are used by other classes in the system, and those dependent classes often break with the changes. Making member variables private means that those other classes cannot simply "reach into" the elementary class anymore.

To fix these problems, I create special accessor functions. These functions let other classes read (or write) the member variables. Often I find that only the "read" accessor is necessary. Sometimes "the write" accessor is necessary.

After I make the member variables private, I create the accessor functions. Then I modify the dependent code to use not the member variable but the accessor function.

These are simple changes; they do not change the semantics of the code. The compile helps you; once you make the member variables private it gleefully points out the other parts of the code that want access to the now-private variables. You know that you have corrected all of the dependent code when the compiler stops complaining.

Making member variables private is one step in untangling code, but not the only step. I will share more of my techniques in future posts.

Tuesday, March 11, 2014

Enterprise software helps the enterprise

A while back I worked at a large enterprise. They had a number of enterprise-class systems, but one system was, in my mind, less than enterprise class.

The system in question supported their internal help desk. This was the team that supported everyone else in the company. They installed PCs and administered the network and its shared resources (file servers and printers, mainly).

To manage requests for assistance, the support team used a trouble-ticket system. Individuals could visit an internal web site to submit requests for assistance.

Their trouble-ticket system was probably sold as an enterprise solution. I also suspect that it had a large price tag (a requirement for enterprise-class software) and a complex contract (also a requirement for enterprise-class software).

The ticket system was designed for the support team. New requests could be evaluated, prioritized, and assigned to team members. The system allowed for time estimates for each request and balanced the workload among team members. It had lots of query and reporting options.

But this trouble-ticket system was not an enterprise-class system -- at least not in my opinion. Let me explain.

The trouble-ticket system was not merely designed for the support team, it was optimized for it. But it was poorly designed for the people who submitted requests.

A person could submit a request. The request was a semi-complicated form which needed a login, then the submitter (name, department, phone number, and mail stop), and finally a description of the problem (problem type, text description, and priority). Once a request was submitted, the system provided a ticket number, a ten-digit code.

Here's where the system lost its enterprise status.

The ten-digit code was the only way to check on the status of a request. There was no way to log in to the system and say "show me all of my open requests". To check on a request, you had to have the ticket number. To check on multiple requests, you had to have the ticket numbers for each request.

The effect was to push work onto other people in the organization. People had to record their ticket numbers. (The system did not even send an e-mail; it only displayed the ticket number on a web page.) recording ticket numbers was a small amount of work, but not zero.

It's not hard to design systems to allow users to log in and see the status of their requests. But this was apparently too much work for the designers of this trouble-ticket system.

True enterprise-class systems work for the entire enterprise. They reduce work all around, or if they must push work from one group to another, it is minimal and necessary.

Systems that push work from one group to another for no good reason are not enterprise-class systems. They may be sold as enterprise-class solutions, they may have expensive price tags, and they may have complicated contracts, but those attributes do not make for enterprise-class software.

Enterprise-class software helps not individual teams or groups but the enterprise become efficient and effective.

Monday, March 10, 2014

IBM makes... mainframes

IBM, that venerable member of the technology world, built its reputation on mainframe computers. And they are still at it.

In the 1940s and 1950s, computing devices were specific to the task. We didn't have general purpose computers; we had tabulators and sorters and various types of machines. The very early electronic calculators were little more than adding machines -- addition was their only operation. The later machines were computers, albeit specialized, usually for military or commercial needs. (Which made some sense, as only the government and large corporations could afford the machines.)

IBM's System/360 changed the game. It was a general purpose machine, suitable for use by government, military, or commercial organizations. IBM's System/370 was a step up with virtual memory, dual processors, and built-in floating point arithmetic.

But these were still large, expensive machines, and these large, expensive machines defined the term "mainframe". IBM was the "big company that makes big computers".

Reluctantly, IBM entered the minicomputer market to compete with companies like DEC and Data General.

Also reluctantly, IBM entered the PC market to compete with Apple, Radio Shack, and other companies that were making inroads into the corporate world.

But I think, in its heart, IBM remained a mainframe company.

Why do I think that? Because over the years IBM has adjusted its product line. Look at what they have stopped producing:

  • Typewriters
  • Photocopiers
  • Disk drives
  • Tape drives
  • Minicomputers
  • Microcomputers (PCs)
  • Laptop computers
  • Printers for PCs

And look at what they have kept in their product line:

  • Mainframe computers
  • Servers
  • Cloud-based services
  • Watson

The last item, Watson, is particularly telling. Watson is IBM's super-sized information storage and retrieval system. It is quite sophisticated and has appeared (successfully) on the "Jeopardy!" TV game show.

Watson is a product that IBM is marketing to large companies (and probably the government). They do not offer a "junior" version for smaller companies or university departments. They do not offer a "personal" version for individuals. IBM's Watson is today's equivalent of the System/360 computer: large, expensive, and made for wealthy clients.

So IBM has come full circle, from the System/360 to minicomputers to personal computers and back to Watson. Will they ever offer smaller versions of Watson? Perhaps, if other companies enter the market and force IBM to respond.

We PC revolutionaries wanted to change the world. We wanted to bring computing to the masses. And we wanted to destroy IBM (or at least take it down a peg or two). Well, we did change the world. We did bring computing to the masses. We did not destroy IBM, or its mainframes. IBM is still the "big company that makes big computers".

Sunday, March 9, 2014

How to untangle code: Remove the tricks

We all have our specialties. Mine is the un-tangling of code. That is, I can re-factor messy code to make it readable (and therefore maintainable).

The process is sometimes complex and sometimes tedious. I have found (discovered?, identified?) a set of practices that allow me to untangle code. As practices, they are imprecise and subject to judgement. Yet they can be useful.

The first practice is to get rid of the tricks. "Tricks" are the neat little features of the language.

In C++, two common types of tricks are pointers and preprocessor macros. (And sometimes they are combined.)

Pointers are to be avoided because they can often cause unintended operations. In C, one must use pointers; in C++ they are to be used only when necessary. One can pass a reference to an object instead of a pointer (or better yet, a reference to a const object). The reference is bound to an object and cannot be changed; a pointer, on the other hand, can be changed to point to something else (if you are very disciplined that something else will be another instance of the same class).

We use pointers in C (and in early C++) to manage elements in a data structure such as a list or a tree. While we can use references, it is better to use members of the C++ STL (or the BOOST library). These containers handle memory allocation and de-allocation. I have successfully untangled programs and eliminated all "new" and "delete" calls from the code.

The other common trick of C++ is the preprocessor. The preprocessor macros are powerful constructs that let one perform all sorts of mischief including changing function names, language keywords, and constant values. Simple macro definitions such as

#define PI 3.1415

can be written in Java or C# (or even C++) as

const double PI = 3.1415;

so one does not really need the preprocessor for those defintions.

More sophisticated macros such as

#define array_value(x, y) { if (y < 100) x[y]; else x[0]; }

let you check the bounds of an array, but the STL std::vector<> container performs this checking for you.

The preprocessor also lets one construct function calls at compile time:

#define call_func(x, y, a1, a2) func_##x##y(a1, a2)

to convert this code

call_func(stats, avg, v1, v2);

to this

func_statsavg(v1, v2);

Mind you, the source code contains only the unconverted line, never the converted line. Your debugger does not know about the post-processed line either. In a sense, #define macros are lies that we programmers tell ourselves.

Worse, they are specific to C++ (and possibly C, depending on their use of object-oriented notations). When you write code that invokes the C++ preprocessor, you lock the code into that language. Java, C#, and later languages do not have the preprocessor (or anything like it).

So when un-tangling code (sometimes with the objective of moving code to another language), one of the first things I do is get rid of the tricks.

Thursday, March 6, 2014

I thought I knew which languages were popular

An item on Hacker News lead me to a blog post by the folks at Codacy. The topic was "code reviews" and they specifically talked about coding style. The people at Codacy were kind enough to provide links to style guidelines for several languages.

I expected the usual suspects: C, C++, Java, C# and perhaps a few others.

Here's the list they presented:

  • Ruby
  • Javascript
  • Python
  • PHP

My list was based on the "business biggies", the languages that are typically used by large businesses to "get the job done". Codacy is in business too, so they are marketing their product to people who can benefit from their offerings. In other words, this is not an open-source, give-it-away-free situation.

Yet they list languages which are not on my "usual suspects" list. All of their languages are not on my list. And none of my languages are on their list. The two lists are mutually exclusive.

The languages that they do list, I must admit, are popular and mature. Individuals and companies use them to "get the job done". Codacy thinks that they can operate their business by focussing on these languages. (Perhaps they have plans to expand to other languages. But even starting with this subset tells us something about their thought process.)

I'm revising my ideas on the languages that businesses use. I'm keeping the existing entries and adding Codacy's. In the future, I will consider more than just C, C++, Java, and C#.

Tuesday, March 4, 2014

After Big Data comes Big Computing

The history of computing is a history of tinkering and revision. We excel at developing techniques to handle new challenges.

Consider the history of programming:

Tabulating machines

  • plug-boards with wires

Von Neumann architecture (mainframes)

  • machine language
  • assembly language
  • compilers (FORTRAN and COBOL)
  • interpreters (BASIC) and timeshare systems

The PC revolution (the IBM PC)

  • assembly language
  • Microsoft BASIC

The Windows age

  • Object-oriented programming
  • Event-driven programming
  • Visual Basic

Virtual machines

  • UCSD p-System
  • Java and the JVM

Dynamic languages

  • Perl
  • Python
  • Ruby
  • Javascript

This (severely abridged) list of hardware and programming styles shows how we change our technology. Our progress is not a smooth advance from one level to the present, but a series of jumps, some of them quite large. It was a large jump from plug-boards to memory-resident programs. It was another large jump to an assembler. One can argue that later jumps were larger or smaller, but those arguments are not important to the basic idea.

Notice that we do not know where things are going. We do not see the entire chain up front. In the 1950s, we did not know that we would end up here (in 2014) with dynamic languages and cloud computing. Often we cannot see the next step until it is upon us and only the best of visionaries can see past it.

Big Data is such a jump, enabled by cheap storage and cloud computing. That change in technology is upon us.

Big Data is the acquisition and storage (and use) of large quantities of data. Not just "lots of data" but mind-boggling quantities of data. Data that makes our current "very large" databases look small and puny. Data that contains not only financial transactions but server logs, e-mails, security videos, medical records, and sensor readings from just about any kind of device. (The sensor readings may be from building sensors for temperature, from vehicles for position and speed and engine performance, from packages in transit, from assembly lines, from gardens and parks for temperature and humidity, ... the list is endless.)

But what happens once we acquire and store these mind-boggling heaps of data?

The obvious solution is to do something with it. And we are doing something with it; we use tools like Hadoop to process and analyze and visualize it.

I think Hadoop (and its brethren) are a good start. We're at the dawn of the "Big Data Age", and we don't really know what we want -- in terms of analyses and tools. We have some tools, and they seem okay.

But this is just the dawn of the "Big Data Age". I think we will develop new techniques and tools to analyze our data. And, I suspect those tools and techniques will require lots of computation. So much computation that someone will coin the term "Big Computing" to represent the use of mind-boggling amounts of computing power.

Big Computing seems a natural follow-on to Big Data. And just as we have developed languages to handle new programming challenges, we will develop new languages for Big Computing.

We have two hints for programming in the era of Big Computing. One hint is cloud computing, with its ability to scale up as we need more power. We've already seen that programs for the cloud have a different organization than "classic" programs. Cloud programs use small modules connected by message queues. The modules hold no state, which allows the system to route transactions to any available module.

The other hint is at the small end of the computing world, at the chip level. Here we see advances in processor design: more cores, more caching, more processing. The GreenArrays GA144 is a chip that contains 144 computers -- not cores, but computers. This is another contender for Big Computing.

I'm not sure what "Big Computing" and its programming will look like, but I am confident that they will be interesting!