Sunday, October 29, 2017

We have a problem

The Rust programming language has a problem.

The problem is one of compactness, or the lack thereof. This problem was brought to my attention by a blog post about the Unix 'yes' program.

In short, Rust requires a lot of code to handle a very simple task.

The simple task, in this case, is the "yes" program from Unix. This program feeds the string "y\n" ('y' with newline) to output as many times as possible.

Here's the program in C:
main(argc, argv)
char **argv;
{
  for (;;)
    printf("%s\n", argc>1? argv[1]: "y");
}
And here is an attempt in Rust:
use std::env;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  loop {
    println!("{}", expletive);
  }
}
The Rust version is quite slow compared to the C version, so the author and others made some "improvements" to Make It Go Fast:
use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;

use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;

pub fn to_bytes(os_str: OsString) -> Vec<u8> {
  use std::os::unix::ffi::OsStringExt;
  os_str.into_vec()
}

fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
  if output.len() > buffer.len() / 2 {
    return output;
  }

  let mut buffer_size = output.len();
  buffer[..buffer_size].clone_from_slice(output);

  while buffer_size < buffer.len() / 2 {
    let (left, right) = buffer.split_at_mut(buffer_size);
    right[..buffer_size].clone_from_slice(left);
    buffer_size *= 2;
  }

  &buffer[..buffer_size]
}

fn write(output: &[u8]) {
  let stdout = io::stdout();
  let mut locked = stdout.lock();
  let mut buffer = [0u8; BUFFER_CAPACITY];

  let filled = fill_up_buffer(&mut buffer, output);
  while locked.write_all(filled).is_ok() {}
}

fn main() {
  write(&env::args_os().nth(1).map(to_bytes).map_or(
    Cow::Borrowed(
      &b"y\n"[..],
    ),
    |mut arg| {
      arg.push(b'\n');
      Cow::Owned(arg)
    },
  ));
  process::exit(1);
}
Now, that's a lot of code. Really a lot. For a simple task.

To be fair, the author mentions that the GNU version of 'yes' weighs in at 128 lines, more that twice this monstrosity in Rust. But another blogger posted this code which improves performance:
#define LEN 2
#define TOTAL 8192
int main() {
    char yes[LEN] = {'y', '\n'};
    char *buf = malloc(TOTAL);
    int bufused = 0;
    while (bufused < TOTAL) {
        memcpy(buf+bufused, yes, LEN);
        bufused += LEN;
    }
    while(write(1, buf, TOTAL));
    return 1;
}

Programming languages should be saving us work. The high-performance solution in Rust is long, way too long, for such simple operations.

We have a problem. It may be in our programming languages. It may be in run-time libraries. It may be in the operating systems and their APIs. It may be in the hardware architecture. It may be a combination of several.

But a problem we have.

No comments: