Monday, August 15, 2011

Iterating over a set is better than looping

When coding, I find it better to use the "foreach" iterator that the "for" loop.

The two are similar but not identical. The "for" operation is a loop for a fixed number of times; the "foreach" operation is applied to a set and repeats the contained code once for each member of the set. A "for" loop will often be used to achieve the same goal, but there is no guarantee that the number of iterations will match the size of the set. A "foreach" iteration is guaranteed to match the set.

For example, I was reviewing code with a colleague today. The code was:

for (int i = 0; i < max_size; i++)
{
for (int j = 0; j < struct_size; j++, i++)
{
item[i] = // some value
}
}

This is an unusual construct. It differs from the normal nested loop:
  • The inner loop increments both index values (i and j)
  • The inner loop contains assignments based on index i, but not j
What's happening here is that the j loop is used as a counter, and the index i is used as an index into the entire structure.

This is a fragile construct; the value max_size must contain the size of the entire structure "item". Normally the max_size would contain the number of larger elements, each element containing a struct_size number of items. Changing the size of item requires understanding (and remembering) this bit of code, since it must change (or at least the initialization of max_size must change).

Changing this code to "foreach" iterators would make it more robust. It also requires us to think about the structures involved. In the previous code, all we know is that we have "max_size" number of items. If the set is truly a linear set, then a single "foreach" is enough to initialize them. (So is a single "for" loop.) If the set actually consists of a set of items (a set within a set), then we have code that looks like:

foreach (Item_set i in larger_set)
{
foreach (Item j in i)
{
j = // some value
}
}

Of course, once you make this transformation, you often want to change the variable names. The names "i" and "j" are useful for indices, but with iterators we can use names that represent the actual structures:

foreach (Item_set item_set in larger_set)
{
foreach (Item item in item_set)
{
item = // some value
}
}

Changing from "for" to "foreach" forces us to think about the true structure of our data and align our code with that structure. It encourages us to pick meaningful names for our iteration operations. Finally, it gives us code that is more robust and resilient to change.

I think that this is a win all the way around.

No comments: