Thursday, November 11, 2010

Improve code with logging

I recently used a self-made logging class to improve my (and others') code. The improvements to code were a pleasant side-effect of the logging; I had wanted more information from the program, information that was not visible in the debugger, and wrote the logging class to capture and present that information. During my use of the logging class, I found the poorly structured parts of the code.

A logging class is a simple thing. My class has four key methods (Enter(), Exit(), Log(), and ToString() ) and a few auxiliary methods. Each method writes information to a text file. (The text file being specified by one of the auxiliary methods.) Enter() is used to capture the entry into a function; Exit() captures the return from the function; Log() adds an arbitrary message to the log file, including variable values; and ToString() converts our variables and structures to plain text. Combined, these methods let us capture the data we need.

I use the class to capture information about the flow of a program. Some of this information is available in the debugger but some is not. We're using Microsoft's Visual Studio, a very capable IDE, but some run-time information is not available. The problem is due, in part, to our program and the data structures we use. The most common is an array of doubles, allocated by 'new' and stored in a double*. The debugger can see the first value but none of the rest. (Oh, it can if we ask for x[n], where 'n' is a specific number, but there is no way to see the whole array, and repeating the request for an array of 100 values is tiresome.)

Log files provide a different view of the run-time than the debugger. The debugger can show you values at a point in time, but you must run the program and stop at the correct point in time. And once there, you can see only the values at that point in time. The log file shows the desired values and messages in sequence, and it can extract the 100-plus values of an array into a readable form. A typical log file would be:

** Enter: foo()
i == 0
my_vars = [10 20 22 240 255 11 0]
** Enter: bar()
r_abs == 22
** Exit: bar()
** Exit: foo()

The log file contains the text that I specify, and nothing beyond it.

Log files give me a larger view than the debugger. The debugger shows values for s single point in time; the log file shows me the values over the life of the program. I can see trends much easier with the log files.

But enough of the direct benefits of log files. Beyond showing me the run-time values of my data, they help me build better code.

Log files help me with code design by identifying the code that is poorly structured. I inject the logging methods into my code, instrumenting it. The function

double Foobar::square(double value)
{
return (value * value);
}

Becomes

double Foobar::square(double value)
{
Logger::Enter("Foobar::square(double)");
Logger::Log("value: ", Logger::ToString(value));
return (value * value);
Logger::Exit("Foobar::square(double)");
}

A bit verbose, and perhaps a little messy, but it gets the job done. The log file will contains lines for every invocation of Foobar::square().

Note that each instrumented function has a pair of methods: Enter() and Exit(). It's useful to know when each function starts and ends.

For the simple function above, one Enter() and one Exit() is needed. But for more complex functions, multiple Exit() calls are needed. For example:

double Foobar::square_root(double value)
{
if (value < 0.0)
return 0.0;
if (value == 0.0)
return 0.0;
return (pow(value, 0.5));
}

The instrumented version of this function must include not one but calls to Exit() for each return statement.

double Foobar::square_root(double value)
{
Logger::Enter("Foobar::square_root(double)");
Logger::Log("value: ", Logger::ToString(value));
if (value < 0.0)
{
Logger::Exit("Foobar::square_root(double)");
return 0.0;
}
if (value == 0.0)
{
Logger::Exit("Foobar::square_root(double)");
return 0.0;
}
Logger::Exit("Foobar::square_root(double)");
return (pow(value, 0.5));
}

Notice all of the extra work needed to capture the multiple exits of this function. This extra work is a symptom of poorly designed code.

In the days of structured programming, the notion of simplified subroutines was put forward. It stated that each subroutine ("function" or "method" in today's lingo) should have only one entry point and only one exit point. This rule seems to have been dropped.

At least the "only one exit point" portion of the rule. Modern day languages allow for only one entry point into a method. They allow for multiple exit points, and this lets us write poor code. A better (uninstrumented) version of the square root method is:

double Foobar::square_root(double value)
{
double result = 0.0;

if (is_rootable(value))
{
result = pow(value, 0.5);
}

return result;
}

bool Foobar::is_rootable(double value)
{
return (value > 0.0);
}

This code is longer but more readable. Instrumenting it is less work, too.

One can visually examine the code for the "extra return" problem, but instrumenting the code with my logging class made the problems immediately visible.


No comments: