Wednesday, November 17, 2010
Just like a toaster
Sunday, November 14, 2010
Java's necessary future
Java is an interesting technology. It proved that virtual processors were feasible. (Java was not the first; the UCSD p-System was a notable predecessor. But Java was actually practical, whereas the earlier attempts were not.) But Java has aged, and needs not just face-lift but a re-thinking of its role in the Oracle stack.
Here's my list of improvements for "Java 2.0":
- Revamp the virtual processor. The original JRE was custom-built for the Java language. Java 2.0 needs to embrace other languages, including COBOL, FORTRAN, LISP, Python, and Ruby.
- Expand the virtual processor to support functional languages, including the new up-and-coming languages of Haskell and Erlang. This will help LISP, Python, and Ruby, too.
- Make the JRE more friendly to virtualization environments like Oracle VM, VMWare, Parallels, Xen, and even Microsoft's Virtual PC and Virtual Server.
- Contribute to the Eclipse IDE, and make it a legitimate player in the Oracle universe.
Java was the ground-breaker for virtual processor technologies. Like other ground-breakers such as FORTRAN, COBOL, and LISP, I think it will be around for a long time. Oracle can use this asset or discard it; the choice is theirs.
Thursday, November 11, 2010
Improve code with logging
I recently used a self-made logging class to improve my (and others') code. The improvements to code were a pleasant side-effect of the logging; I had wanted more information from the program, information that was not visible in the debugger, and wrote the logging class to capture and present that information. During my use of the logging class, I found the poorly structured parts of the code.
A logging class is a simple thing. My class has four key methods (Enter(), Exit(), Log(), and ToString() ) and a few auxiliary methods. Each method writes information to a text file. (The text file being specified by one of the auxiliary methods.) Enter() is used to capture the entry into a function; Exit() captures the return from the function; Log() adds an arbitrary message to the log file, including variable values; and ToString() converts our variables and structures to plain text. Combined, these methods let us capture the data we need.
I use the class to capture information about the flow of a program. Some of this information is available in the debugger but some is not. We're using Microsoft's Visual Studio, a very capable IDE, but some run-time information is not available. The problem is due, in part, to our program and the data structures we use. The most common is an array of doubles, allocated by 'new' and stored in a double*. The debugger can see the first value but none of the rest. (Oh, it can if we ask for x[n], where 'n' is a specific number, but there is no way to see the whole array, and repeating the request for an array of 100 values is tiresome.)
Log files provide a different view of the run-time than the debugger. The debugger can show you values at a point in time, but you must run the program and stop at the correct point in time. And once there, you can see only the values at that point in time. The log file shows the desired values and messages in sequence, and it can extract the 100-plus values of an array into a readable form. A typical log file would be:
** Enter: foo()
i == 0
my_vars = [10 20 22 240 255 11 0]
** Enter: bar()
r_abs == 22
** Exit: bar()
** Exit: foo()
The log file contains the text that I specify, and nothing beyond it.
Log files give me a larger view than the debugger. The debugger shows values for s single point in time; the log file shows me the values over the life of the program. I can see trends much easier with the log files.
But enough of the direct benefits of log files. Beyond showing me the run-time values of my data, they help me build better code.
Log files help me with code design by identifying the code that is poorly structured. I inject the logging methods into my code, instrumenting it. The function
double Foobar::square(double value)
{
return (value * value);
}
Becomes
double Foobar::square(double value)
{
Logger::Enter("Foobar::square(double)");
Logger::Log("value: ", Logger::ToString(value));
return (value * value);
Logger::Exit("Foobar::square(double)");
}
A bit verbose, and perhaps a little messy, but it gets the job done. The log file will contains lines for every invocation of Foobar::square().
Note that each instrumented function has a pair of methods: Enter() and Exit(). It's useful to know when each function starts and ends.
For the simple function above, one Enter() and one Exit() is needed. But for more complex functions, multiple Exit() calls are needed. For example:
double Foobar::square_root(double value)
{
if (value < 0.0)
return 0.0;
if (value == 0.0)
return 0.0;
return (pow(value, 0.5));
}
The instrumented version of this function must include not one but calls to Exit() for each return statement.
double Foobar::square_root(double value)
{
Logger::Enter("Foobar::square_root(double)");
Logger::Log("value: ", Logger::ToString(value));
if (value < 0.0)
{
Logger::Exit("Foobar::square_root(double)");
return 0.0;
}
if (value == 0.0)
{
Logger::Exit("Foobar::square_root(double)");
return 0.0;
}
Logger::Exit("Foobar::square_root(double)");
return (pow(value, 0.5));
}
Notice all of the extra work needed to capture the multiple exits of this function. This extra work is a symptom of poorly designed code.
In the days of structured programming, the notion of simplified subroutines was put forward. It stated that each subroutine ("function" or "method" in today's lingo) should have only one entry point and only one exit point. This rule seems to have been dropped.
At least the "only one exit point" portion of the rule. Modern day languages allow for only one entry point into a method. They allow for multiple exit points, and this lets us write poor code. A better (uninstrumented) version of the square root method is:
double Foobar::square_root(double value)
{
double result = 0.0;
if (is_rootable(value))
{
result = pow(value, 0.5);
}
return result;
}
bool Foobar::is_rootable(double value)
{
return (value > 0.0);
}
This code is longer but more readable. Instrumenting it is less work, too.
One can visually examine the code for the "extra return" problem, but instrumenting the code with my logging class made the problems immediately visible.
Sunday, November 7, 2010
Where Apple is falling behind
Yet in one aspect, Apple is falling behind.
Apple is using real processor technology, not the virtual processors that Microsoft and Google use. By "virtual processors", I mean the virtual layers that separate the application code from the physical processor. Microsoft has .NET with its virtual processor, its IL code, and its CLR. Google uses Java and Python, and those languages also have the virtual processing layer.
Most popular languages today have a virtual processing layer. Java uses its Java Virtual Machine (JVM). Perl, Python, and Ruby use their virtual processors.
But Apple uses Objective-C, which compiles to the physical processor. In this, Apple is alone.
Compiling to physical processor has the advantage of performance. The virtual processors of the JVM and .NET (and Perl, and Python...) impose a performance cost. A lot of work has been done to reduce that cost, but the cost is still there. Microsoft's use of .NET for its Windows Mobile offerings means higher demands for processor power and higher draw from the battery. An equivalent Apple product can run with less power.
Compiling to a virtual processor also has advantages. The virtual environments can be opened to debuggers and performance monitors, something not possible with a physical processor. Therefore, writing a debugger or a performance monitor in a virtual processor environment is easier and less costly.
The languages which use virtual processors all have the ability for class introspection (or reflection, as some put it). I don't know enough about Objective-C to know if this is possible, but I do know that C and C++ don't have reflection. Reflection makes it easier to create unit tests and perform some type checking on code, which reduces the long-term cost of the application.
The other benefit of virtual processors is freedom from the physical processor, or rather from the processor line. Programs written to the virtual processor can run anywhere, with the virtual processor layer. This is how Java can run anywhere: the byte-code is the same, only the virtual processor changes from physical processor to physical processor.
The advantages of performance are no longer enough to justify a physical processor. Virtual processors have advantages that help developers and reduce the development costs.
Is it possible that Apple is working on a virtual processor of its own? The technology is well within their abilities.
I suspect that Apple would build their own, rather than use any of the existing virtual processors. The two biggies, .NET and Java, are owned by companies not friendly to Apple. The engines for Perl, Python, and Ruby are nice but perhaps not suitable to the Apple set of applications. An existing engine is not in Apple's best interests. They need their own.
Apple doesn't need the virtual processor engine immediately, but they will need it soon -- perhaps within two years. But there is more to consider.
Apple has pushed the Objective-C, C, and C++ languages for its development platform. For the iPhone, iPod, and iPad, it has all but banished other languages and technologies. But C, C++, and Objective-C are poorly suited for virtual processors. Apple will need a new language.
Given Apple's desire for reliable (that is, crash-free) applications, the functional languages may appeal to them. Look for a move from Objective-C to either Haskell or Erlang, or possibly a new language developed at Apple.
It will be a big shift for Apple, and their developers, but in the long run beneficial.