Monday, November 29, 2010
The return of frugality
Sunday, November 28, 2010
Measuring the load
Wednesday, November 24, 2010
The line doesn't move; the people in the line do
Tuesday, November 23, 2010
Getting from here to there
In the pre-PC era, there was no one vendor of hardware and software, and there was no one standard format for exchangeable media (also know as "floppy disks"). The de facto standard for an operating system was CP/M, but Apple had DOS and Radio Shack had TRS-DOS, and the UCSD p-System was lurking in corners.
Even with CP/M as a standard across the multitude of hardware platforms (Imsai, Sol, North Star, Heathkit, etc.) the floppy disk formats varied. These floppy disks were true floppies, in either 8 inch or 5.25 inch forms, with differences in track density, track count, and even the number of sides. There were single-sided disks and double-sided disks. There were 48-tpi (tracks per inch) disks and 96-tpi disks. Tracks were records in single density, double density, quad density, and extended density.
Moving data from one computer to another was an art, not a science, and most definitely not easy. It was all too common to have data on one system and desire it on another computer. Truly, these early computers were islands of automation.
Yet the desire to move data won out. We used null modem cables, bulletin board systems, and specially-written software to read "foreign" disks. (The internet existed at the time, but not for the likes of us hobbyists.)
Over time, we replaced the null modem cables and bulletin board systems with real networks. Today, we think nothing of moving data. Indeed, the cell phone business is the business of moving data!
The situation of cloud computing is similar. Clouds can hold data and applications, but we're not in a position to move data from one cloud to another. Well, not easily. One can dump the MySQL database to a text file, FTP it to a new site, and then import it into a new MySQL database; this is the modern-day equivalent of the null modem cables of yore.
Data exchange (for PCs) grew over a period of twenty years, from the early microcomputers, to the first IBM PC, to the releases of Netware, Windows for Workgroups, IBM OS/2, Windows NT, and eventually Windows Server. The big advances came when large players arrived on the scene: first IBM with an open hardware platform that allowed for network cards, and later Novell and Microsoft with closed software platforms that established standards (or used existing ones).
I expect that data exchange for cloud apps will follow a similar track. Unfortunately, I also expect that it will take a similar period of time.
Sunday, November 21, 2010
Just how smart is your process?
The American mindset is one of process over skill. Define a good process and you don't need talented (that is, expensive) people. Instead of creative people, you can staff your teams with non-creative (that is, low wage) employees, and still get the same results. Or so the thinking goes.
The trend goes back to the scientific management movement of the early twentieth century.
For some tasks, the de-skilling of the workforce may make sense. Jobs that consist of repeated, well-defined steps, jobs with no unexpected factors, jobs that require no thought or creativity, can be handled by a process.
The creation of software is generally unrepeatable, has poorly-defined steps, has unexpected factors and events, and requires a great deal of thought. Yet many organizations (especially large organizations) attempt to define processes to make software development repeatable and predictable.
These organizations confuse software development with the project of software development. While the act of software development is unpredictable, a project for software development can be fit into a process. The project management tasks (status reports, personnel assignment, skills assessment, cost calculations, etc.) can be made routine. You most likely want them routine and standardized, to allow for meaningful comparison of one project to another.
Yet the core aspect of software development remains creative, and you cannot create a process for creative acts. (Well, you can create a process, and inflict it upon your people, but you will have little success with it.) Programming is an art more than science, and by definition an art is something outside of the realm of repeated processes.
Some organizations define a process that uses very specific requirements or design documents, removing all ambiguity and leaving the programming to low-skilled individuals. While this method appears to solve the "programming is an art" problem, it merely shifts the creative aspect to another group of individuals. This group (usually the "architects", "chief engineers", "tech leads", or "analysts") are doing the actual programming. (Perhaps not programming in FORTRAN or C#, but programming in English.) Shifting the creative work away from the coders introduces several problems, including the risk of poor run-time performance and the risk of specifying impossible solutions. Coders, the folks who wrangle the compiler, have the advantage of knowing that their solutions will either work or not work -- the computer tells them so. Architects and analysts who "program in English" have no such accurate and absolute feedback.
Successful management of software development consists not of reducing every task to a well-defined, repeatable set of steps, but of dividing tasks into the "repeatable" and "creative" groups, and managing both groups. For the repeatable tasks, use tools and techniques to automate the tasks and make them as friction-free as possible. For the creative tasks, provide well-defined goals and allow your teams to work on imaginative solutions.
Thursday, November 18, 2010
The new new thing
Wednesday, November 17, 2010
Just like a toaster
Sunday, November 14, 2010
Java's necessary future
Java is an interesting technology. It proved that virtual processors were feasible. (Java was not the first; the UCSD p-System was a notable predecessor. But Java was actually practical, whereas the earlier attempts were not.) But Java has aged, and needs not just face-lift but a re-thinking of its role in the Oracle stack.
Here's my list of improvements for "Java 2.0":
- Revamp the virtual processor. The original JRE was custom-built for the Java language. Java 2.0 needs to embrace other languages, including COBOL, FORTRAN, LISP, Python, and Ruby.
- Expand the virtual processor to support functional languages, including the new up-and-coming languages of Haskell and Erlang. This will help LISP, Python, and Ruby, too.
- Make the JRE more friendly to virtualization environments like Oracle VM, VMWare, Parallels, Xen, and even Microsoft's Virtual PC and Virtual Server.
- Contribute to the Eclipse IDE, and make it a legitimate player in the Oracle universe.
Java was the ground-breaker for virtual processor technologies. Like other ground-breakers such as FORTRAN, COBOL, and LISP, I think it will be around for a long time. Oracle can use this asset or discard it; the choice is theirs.
Thursday, November 11, 2010
Improve code with logging
I recently used a self-made logging class to improve my (and others') code. The improvements to code were a pleasant side-effect of the logging; I had wanted more information from the program, information that was not visible in the debugger, and wrote the logging class to capture and present that information. During my use of the logging class, I found the poorly structured parts of the code.
A logging class is a simple thing. My class has four key methods (Enter(), Exit(), Log(), and ToString() ) and a few auxiliary methods. Each method writes information to a text file. (The text file being specified by one of the auxiliary methods.) Enter() is used to capture the entry into a function; Exit() captures the return from the function; Log() adds an arbitrary message to the log file, including variable values; and ToString() converts our variables and structures to plain text. Combined, these methods let us capture the data we need.
I use the class to capture information about the flow of a program. Some of this information is available in the debugger but some is not. We're using Microsoft's Visual Studio, a very capable IDE, but some run-time information is not available. The problem is due, in part, to our program and the data structures we use. The most common is an array of doubles, allocated by 'new' and stored in a double*. The debugger can see the first value but none of the rest. (Oh, it can if we ask for x[n], where 'n' is a specific number, but there is no way to see the whole array, and repeating the request for an array of 100 values is tiresome.)
Log files provide a different view of the run-time than the debugger. The debugger can show you values at a point in time, but you must run the program and stop at the correct point in time. And once there, you can see only the values at that point in time. The log file shows the desired values and messages in sequence, and it can extract the 100-plus values of an array into a readable form. A typical log file would be:
** Enter: foo()
i == 0
my_vars = [10 20 22 240 255 11 0]
** Enter: bar()
r_abs == 22
** Exit: bar()
** Exit: foo()
The log file contains the text that I specify, and nothing beyond it.
Log files give me a larger view than the debugger. The debugger shows values for s single point in time; the log file shows me the values over the life of the program. I can see trends much easier with the log files.
But enough of the direct benefits of log files. Beyond showing me the run-time values of my data, they help me build better code.
Log files help me with code design by identifying the code that is poorly structured. I inject the logging methods into my code, instrumenting it. The function
double Foobar::square(double value)
{
return (value * value);
}
Becomes
double Foobar::square(double value)
{
Logger::Enter("Foobar::square(double)");
Logger::Log("value: ", Logger::ToString(value));
return (value * value);
Logger::Exit("Foobar::square(double)");
}
A bit verbose, and perhaps a little messy, but it gets the job done. The log file will contains lines for every invocation of Foobar::square().
Note that each instrumented function has a pair of methods: Enter() and Exit(). It's useful to know when each function starts and ends.
For the simple function above, one Enter() and one Exit() is needed. But for more complex functions, multiple Exit() calls are needed. For example:
double Foobar::square_root(double value)
{
if (value < 0.0)
return 0.0;
if (value == 0.0)
return 0.0;
return (pow(value, 0.5));
}
The instrumented version of this function must include not one but calls to Exit() for each return statement.
double Foobar::square_root(double value)
{
Logger::Enter("Foobar::square_root(double)");
Logger::Log("value: ", Logger::ToString(value));
if (value < 0.0)
{
Logger::Exit("Foobar::square_root(double)");
return 0.0;
}
if (value == 0.0)
{
Logger::Exit("Foobar::square_root(double)");
return 0.0;
}
Logger::Exit("Foobar::square_root(double)");
return (pow(value, 0.5));
}
Notice all of the extra work needed to capture the multiple exits of this function. This extra work is a symptom of poorly designed code.
In the days of structured programming, the notion of simplified subroutines was put forward. It stated that each subroutine ("function" or "method" in today's lingo) should have only one entry point and only one exit point. This rule seems to have been dropped.
At least the "only one exit point" portion of the rule. Modern day languages allow for only one entry point into a method. They allow for multiple exit points, and this lets us write poor code. A better (uninstrumented) version of the square root method is:
double Foobar::square_root(double value)
{
double result = 0.0;
if (is_rootable(value))
{
result = pow(value, 0.5);
}
return result;
}
bool Foobar::is_rootable(double value)
{
return (value > 0.0);
}
This code is longer but more readable. Instrumenting it is less work, too.
One can visually examine the code for the "extra return" problem, but instrumenting the code with my logging class made the problems immediately visible.
Sunday, November 7, 2010
Where Apple is falling behind
Yet in one aspect, Apple is falling behind.
Apple is using real processor technology, not the virtual processors that Microsoft and Google use. By "virtual processors", I mean the virtual layers that separate the application code from the physical processor. Microsoft has .NET with its virtual processor, its IL code, and its CLR. Google uses Java and Python, and those languages also have the virtual processing layer.
Most popular languages today have a virtual processing layer. Java uses its Java Virtual Machine (JVM). Perl, Python, and Ruby use their virtual processors.
But Apple uses Objective-C, which compiles to the physical processor. In this, Apple is alone.
Compiling to physical processor has the advantage of performance. The virtual processors of the JVM and .NET (and Perl, and Python...) impose a performance cost. A lot of work has been done to reduce that cost, but the cost is still there. Microsoft's use of .NET for its Windows Mobile offerings means higher demands for processor power and higher draw from the battery. An equivalent Apple product can run with less power.
Compiling to a virtual processor also has advantages. The virtual environments can be opened to debuggers and performance monitors, something not possible with a physical processor. Therefore, writing a debugger or a performance monitor in a virtual processor environment is easier and less costly.
The languages which use virtual processors all have the ability for class introspection (or reflection, as some put it). I don't know enough about Objective-C to know if this is possible, but I do know that C and C++ don't have reflection. Reflection makes it easier to create unit tests and perform some type checking on code, which reduces the long-term cost of the application.
The other benefit of virtual processors is freedom from the physical processor, or rather from the processor line. Programs written to the virtual processor can run anywhere, with the virtual processor layer. This is how Java can run anywhere: the byte-code is the same, only the virtual processor changes from physical processor to physical processor.
The advantages of performance are no longer enough to justify a physical processor. Virtual processors have advantages that help developers and reduce the development costs.
Is it possible that Apple is working on a virtual processor of its own? The technology is well within their abilities.
I suspect that Apple would build their own, rather than use any of the existing virtual processors. The two biggies, .NET and Java, are owned by companies not friendly to Apple. The engines for Perl, Python, and Ruby are nice but perhaps not suitable to the Apple set of applications. An existing engine is not in Apple's best interests. They need their own.
Apple doesn't need the virtual processor engine immediately, but they will need it soon -- perhaps within two years. But there is more to consider.
Apple has pushed the Objective-C, C, and C++ languages for its development platform. For the iPhone, iPod, and iPad, it has all but banished other languages and technologies. But C, C++, and Objective-C are poorly suited for virtual processors. Apple will need a new language.
Given Apple's desire for reliable (that is, crash-free) applications, the functional languages may appeal to them. Look for a move from Objective-C to either Haskell or Erlang, or possibly a new language developed at Apple.
It will be a big shift for Apple, and their developers, but in the long run beneficial.