Sunday, October 29, 2017

We have a problem

The Rust programming language has a problem.

The problem is one of compactness, or the lack thereof. This problem was brought to my attention by a blog post about the Unix 'yes' program.

In short, Rust requires a lot of code to handle a very simple task.

The simple task, in this case, is the "yes" program from Unix. This program feeds the string "y\n" ('y' with newline) to output as many times as possible.

Here's the program in C:
main(argc, argv)
char **argv;
{
  for (;;)
    printf("%s\n", argc>1? argv[1]: "y");
}
And here is an attempt in Rust:
use std::env;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  loop {
    println!("{}", expletive);
  }
}
The Rust version is quite slow compared to the C version, so the author and others made some "improvements" to Make It Go Fast:
use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;

use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;

pub fn to_bytes(os_str: OsString) -> Vec<u8> {
  use std::os::unix::ffi::OsStringExt;
  os_str.into_vec()
}

fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
  if output.len() > buffer.len() / 2 {
    return output;
  }

  let mut buffer_size = output.len();
  buffer[..buffer_size].clone_from_slice(output);

  while buffer_size < buffer.len() / 2 {
    let (left, right) = buffer.split_at_mut(buffer_size);
    right[..buffer_size].clone_from_slice(left);
    buffer_size *= 2;
  }

  &buffer[..buffer_size]
}

fn write(output: &[u8]) {
  let stdout = io::stdout();
  let mut locked = stdout.lock();
  let mut buffer = [0u8; BUFFER_CAPACITY];

  let filled = fill_up_buffer(&mut buffer, output);
  while locked.write_all(filled).is_ok() {}
}

fn main() {
  write(&env::args_os().nth(1).map(to_bytes).map_or(
    Cow::Borrowed(
      &b"y\n"[..],
    ),
    |mut arg| {
      arg.push(b'\n');
      Cow::Owned(arg)
    },
  ));
  process::exit(1);
}
Now, that's a lot of code. Really a lot. For a simple task.

To be fair, the author mentions that the GNU version of 'yes' weighs in at 128 lines, more that twice this monstrosity in Rust. But another blogger posted this code which improves performance:
#define LEN 2
#define TOTAL 8192
int main() {
    char yes[LEN] = {'y', '\n'};
    char *buf = malloc(TOTAL);
    int bufused = 0;
    while (bufused < TOTAL) {
        memcpy(buf+bufused, yes, LEN);
        bufused += LEN;
    }
    while(write(1, buf, TOTAL));
    return 1;
}

Programming languages should be saving us work. The high-performance solution in Rust is long, way too long, for such simple operations.

We have a problem. It may be in our programming languages. It may be in run-time libraries. It may be in the operating systems and their APIs. It may be in the hardware architecture. It may be a combination of several.

But a problem we have.

Sunday, October 15, 2017

Don't make things worse

We make many compromises in IT.

Don't make things worse. Don't break more things to accommodate one broken thing.

On one project in my history, we had an elderly Windows application that used some data files. The design of the application was such that the data files had to reside within the same directory as the executable. This design varies from the typical design for a Windows application, which sees data files stored in a user-writeable location. Writing to C:\Program Files should be done only by install programs, and only with elevated privileges.

Fortunately, the data files were read by the application but not written, so we did not have to grant the application write access to a location under C:\Program Files. The program could run, with its unusual configuration, and no harm was done.

But things change, and there came a time when this application had to share data with another application, and that application *did* write to its data files.

The choices were to violate generally-accepted security configurations, or modify the offending application. We could grant the second application write permissions into C:\Program Files (actually, a subdirectory, but still a variance from good security).

Or, we could install the original application in a different location, one in which the second application could write to files. This, too, is a variance from good security. Executables are locked away in C:\Program Files for a reason -- they are the targets of malware, and Windows guards that directory. (True, Windows does look for malware in all directories, but its better for executables to be locked away until needed.)

Our third option was to modify the original application. This we could do; we had the source code and we had built it in the past. The code was not in the best of shape, and small changes could break things, but we did have experience with changes and a battery of tests to back us up.

In the end, we selected the third option. This was the best option, for a number of reasons.

First, it moved the original application closer to the standard model for Windows. (There were other things we did not fix, so the application is not perfect.)

Second, it allowed us to follow accepted procedures for our Windows systems.

Finally, it prevented the spread of bad practices. Compromising security to accommodate a poorly-written application is a dangerous path. It expands the "mess" of one application into the configuration of the operating system. Better to contain the mess and not let it grow.

We were lucky. We had an option to fix the problem application and maintain security. We had the source code for the application and knowledge about the program. Sometimes the situation is not so nice, and a compromise is necessary.

But whenever possible, don't make things worse.

Sunday, October 8, 2017

The Amazon Contest

Allow me to wander from my usual space of technology and share some thoughts on the Amazon.com announcement.

The announcement is for their 'HQ2', an office building (or complex) with 50,000 well-paid employees that is up for grabs to a lucky metropolitan area. City planners across the country are salivating over winning such an employer for their struggling town. Amazon has announced the criteria that they will consider for the "winner", including educated workforce, transit, and cost of living.

The one thing that I haven't seen is an analysis of the workforce numbers. From this one factor alone, we can narrow the hopeful cities to a handful.

Amazon.com wants their HQ2 complex to employee 50,000 people. That means that they will either hire the people locally or they will relocate them. Let's assume that they relocate one-third of the employees in HQ2. (The relocated could be current employees at other Amazon.com offices or new hires from out of the HQ2 area.)

That leaves about 33,000 people to hire. Assuming that they hire half as entry-level, they will need the other half to be experienced. (I'm assuming that Amazon.com will not relocate entry-level personnel.)

The winning city will have to supply 16,000 experienced professionals and 16,000 entry-level people. That's not an easy lift, and not one that many cities can offer. It means that the city (or metro area) must have a large population of professionals -- larger than 16,000 because not everyone will be willing to leave their current position and enlist with Amazon.com. (And Amazon.com may be unwilling to hire all candidates.)

If we assume that only one in ten professionals are willing to move, then Amazon.com needs a metro area with at least 160,000 professionals. (Or, if Amazon.com expected to pick one in ten candidates, the result is the same.)

And don't forget the relocated employees. They will need housing. Middle class, ready to own, housing -- not "fixer uppers" or "investment opportunities". A few relocatees may choose the "buy and invest" option, but most are going to want a house that is ready to go. How many cities have 15,000 modern housing units available?

These two numbers -- available housing and available talent -- set the entrance fee. Without them, metro areas cannot compete, no matter how good the schools or the transit system or the tax abatement.

So when Amazon.com announces the location of HQ2, I won't be surprised if it has a large population of professionals and a large supply of housing. I also won't be surprised if it doesn't have some the other attributes that Amazon.com put on the list, such as incentives and tax structure.

Wednesday, October 4, 2017

Performance, missing and found

One of the constants in technology has been the improvement of performance. More powerful processors, faster memory, larger capacity in physically smaller disks, and faster communications have been the results of better technology.

This increase in performance is mostly mythological. We are told that our processors are more powerful, we are told that memory and network connections are faster. Yet what is our experience? What are the empirical results?

For me, word processors and spreadsheets run just as fast as they did decades ago. Operating systems load just as fast -- or just as slow.

Linux on my 2006 Apple MacBook loads slower than 1980s-vintage systems with eight-bit processors and floppy disk drives. Windows loads quickly, sort of. It displays a log-on screen and lets me enter a name and password, but then it takes at least five minutes (and sometimes an hour) updating various things.

Compilers and IDEs suffer the same fate. Each new version of Visual Studio takes longer to load. Eclipse is no escape -- it has always required a short eternity to load and edit a file. Slow performance is not limited to loading; compilation times have improved but only slightly, and not by the orders of magnitude to match the advertised improvements in hardware.

Where are the improvements? Where is the blazing speed that our hardware manufacturers promise?

I recently found that "missing" performance. It was noted in an article on the longevity of the C language, of all things. The author clearly and succinctly describes C and its place in the world. On the way, he describes the performance of one of his own C programs:
"In 1987 this code took around half an hour to run, today 0.03 seconds."
And there it is. A description of the performance improvements we should see in our systems.

The performance improvements we expect from better hardware has gone into software.

We have "invested" that performance in our operating systems, our programming languages, and user interfaces. Instead of taking all the improvements for reduced running times, we have diverted performance to new languages and to "improvements" in older languages. We invested in STL over plain old C++, Java over C++ (with or without STL), Python over Java and C#.

Why not? Its better to prevent mistakes than to have fast-running programs that crash or -- worse -- don't crash but provide incorrect results. Our processors are faster, and our programming languages do more for us. Boot times, load times, and compile times may be about the same as from decades ago, but errors are far fewer, easily detected, and much less dangerous.

Yes, there are still defects which can be exploited to hack into systems. We have not achieved perfection.

Our systems are much better with operating systems and programming languages that do the checking that the now do, and businesses and individuals can rely on computers to get the job done.

That's worth some CPU cycles.