Wednesday, January 4, 2012

Vox populi

Apple's "Siri" has shown that voice recognition is not only possible but also practical (and fun). Voice-driven apps are naturals for the smartphone and tablet world, where keyboards and mice are foreigners. A voice-driven app solves the problem of composition on a tablet app: most smartphone and tablet apps are better for the consumption of data, with desktop apps better for composing data.

The new world of voice-driven apps will bring a number of changes.

Voice-driven apps require a new design. One cannot simple replace a keyboard and mouse with voice-driven commands; the user experience is too different.

The (relatively) commonplace task of designing a GUI for a program becomes problematic with voice-directed apps. Do we use commands such as "place a button below the list box"? This may lead to a renaissance of SHRDLU and the old "Adventure" and "Zelda" programs, with their commands of "go north" and "take everything but the snake".

The testing of apps will change, and require new tools and techniques. How does one test a voice-driven app? Do you have people speak the commands, or do you have pre-recorded commands and play them back to the app? How can you build a suite of automated tests for voice-driven apps?

Our notions of the workplace will change. For the past four decades, we have built workplaces from cubicles. Cubicles are sufficient for people to type on keyboards, but are poor environments for speaking to computers. With voice-driven apps, and everyone speaking at their computer, the noise in the typical office increases. Voice-recognition software may be able to filter out background noise and voices; humans may have a harder time of it. Do we change our workplaces to individual offices?

What about the folks working at home, or working in co-working locations? We'll need quiet places to perform our work. At home, that may mean a separate room with a door. Co-working sites may change to suites of hard-walled offices.

The introvert/extrovert gap may be significant with voice-driven apps. Introverts emphasize written text, and they spend a lot of time composing their words. Extroverts speak readily, with less weight on their ideas. The extroverts share their ideas early and look to the group to help shape the final concepts; introverts think up front and have already decided by the time they hand you the document. I expect that extroverts will be more comfortable (or perhaps less uncomfortable) with voice-driven apps.

Do we combine voice-driven with gesture-driven? Microsoft's Kinect has shown itself to be capable and reliable. Perhaps we will have a combination of voice and gesture to control computers and to create content. I can easily see the layout of a GUI being developed by a programmer with voice-driven and gesture-driven commands. "Place a new button at the bottom of the screen", says the developer, and the IDE will create a new button. "Change the text to 'Cancel'" says the developer, "Now move the button to the right" and the programmer gestures with his hands, pushing an imaginary button to the right until it is in the desired position.

Voice recognition is here. I see lots of changes, for developers, testers, and users.

No comments: