Saturday, July 17, 2010

Bossy GPS voices, Kindle Tom's latest interview, and voice technology

CNN's Why GPS voices are so condescending story today opened with something that always makes me laugh, and I had just typed, the day before in an email, something similar to what CNN's John D. Sutter describes:

"In this tech-saturated world, few things are more annoying than car navigation systems that yell at you for making a wrong turn.

"Re-CALC-ulating," the system says in that condescending robot voice, as if it is offended by having to rethink the route.

"Turn left at ... [sigh] ... recalculating ..."

The article continues with TomTom's CEO Mark Gretton explaining that your GPS unit's job is to give you a string of commands -- "Do this, do that, turn right," and a snippy voice may not help your driving mood.

  Actually, I'm usually just very thankful it's telling me where to go.  Before the barking GPS, I was always getting lost.  Now I'm lost-anxiety free.  If you miss a turn, you know it'll get you to turn around and head where you should.  Most of the time.  I was in a tunnel once, in the left lane, when it suddenly told me to turn left.  They're not flawless, but it could have been GPS-rage for my ignoring it at times.

The article is an interesting one about the difficulty of making voice systems sound more human, outside of the too-human guilt-tripping scold mode.
' These machines face a striking number of technological hurdles in their efforts to sound un-robotic.

Complex speech patterns
The most obvious reason the computers have trouble is that human speech is almost infinitely complex. There are about 40 phonemes -- or basic sounds -- in the English language, but there are seemingly limitless combinations.

To try to get computers on the right track, voice technologists record human actors reading all kinds of wacky sentences, which are designed to elicit as many phoneme combinations as possible.

Computers store all these sentences in a database, chop them into sounds, and then remix them to make any possible combination of words.

The result is intelligible, but it's not quite human.  A super-high-quality computer voice might require 40 hours of voice recordings in order to sound nearly human, said Andy Aaron, a computer speech researcher at IBM.'

 Nuance is the company that provides the Kindle's male voice, Tom, but I don't think the female voice (which sounds more stern to me) has been definitely identified.

Blog articles on Kindle Tom
  Here's the earlier Kindleworld story on Tom (the real guy whose voice is used for the Kindle's computerized pronunciations and inflections) placed on a special page that Kindle-Edition blog subscribers can click on.  Website readers can click here.  For those accessing this on the computer, there is a sample of Tom introducing himself by reading a script I modified from Bufo Calvin's original script for his own Kindle.

  Bufo wrote a blog article on both Kindle Tom AND how to use the Kindle text-to-speech feature, titled You talkin’ to me? TTS 101 for the Kindle.  He describes how he uses this feature, when, and why.

Funny forum-thread on "Tomisms"
  As mentioned in the earlier Kindle-voice article here, there is a fun Amazon Kindle-forum message thread titled "Tomisms" a collection of odd events in the Kindle voice's (mis)reading of various words (some of these are hilarious).

Newer interview with Kindle Tom
  And Kindicted recently did a newer interview of Tom Glynn, the man behind the computerized voice, after he released a non-computerized-voice album of songs (baritone acoustic guitar and vocals) as an mp3-set (#8 "Back Home" is a good sample track) and as a CD-R set made by Amazon on demand (I didn't know they were doing that).

  Part of the interview is on the technical aspects of trying to record varying snippets of words ("diphones") that will work when put together to try to emulate the appropriate inflections and affect of speaking when the computerized interpreter doesn't "know" what's coming up next in the sentence (I think they must have a way of scanning that first, though?).

The 4-part article
  The computerized voice article is Part 1 of 4 parts ("Kindle Text-to-Speech") on the history of the development of text-to-speech, and there's another interview with a principal of the company that created RealSpeak (Lernout & Hauspie, who themselves had bought Berkeley Speech Technologies for whom I coincidentally did some computer-network support), owned now by Nuance.

  Glynn now hears his own voice coming over the loudspeakers of CVS pharmacies or when watching storm bulletins when the National Weather Forecast voice gives alerts.  He also is the phone voice for Bank of America, United, Apple, and CVS.

  He was addicted to his 1st generation Kindle when he found out he was going to be its voice.  It can't be easy to hear people carrying on negatively about the voice sounding robotic though.  Most I know are surprised that it is less computerized sounding than they'd expected relative to some heard on computer software.  Part 1's intro to the interview ends with this:
' As an added bonus, this interview is available in mobi [Kindle] format here.  Simply download the mobi file, transfer it to your Kindle, and play the interview using the default male voice (Tom’s).  In some sense, Tom will be reading the interview aloud using his own voice! '

You can download that from Kindicted's Google-Docs area.

Check often:  Temporarily-free late-listed non-classics or recently published ones
  Guide to finding Free Kindle books and Sources.  Top 100 free bestsellers. Below are ways to Share this post if you'd like others to see it.
-- The Send to Kindle button works well only on Firefox currently.

Send to Kindle

(Older posts have older Kindle model info. For latest models, see CURRENT KINDLES page. )
If interested, you can also follow my add'l blog-related news at Facebook and Twitter
Questions & feedback are welcome in the Comment areas (tho' spam is deleted). Thanks!


  1. That's actually one of the reasons I like my current doesn't make a comment if I vary the route (on purpose or accidentally) just gives me the new route. Oh, and TomTom has begun releasing Star Wars voices. I find the Darth Vader one particularly amusing:

  2. Bufo,
    "I find your lack of faith disturbing" is a new variation on " ReCALCulating...." :-)

    Thanks for those!


NOTE: TO AVOID SPAM being posted instantly, this blog uses the "DELAY" feature.

Am often away much of the day, and postings won't show up right away. Posts done to use referrer-links may never show up.

Usually, am online enough to release comments within a day though, so the hard-to-read match-text tests for commenting won't be needed this way.

Feedback and questions are welcome. Thanks for participating.

Technical Problems?
If you're having problems leaving a Comment, Google's blogger-help asks that you clear the '' cookies on your browser's Tools or Options menu bar and that will fix the Comment-box problems (until they have a permanent fix).

IF that doesn't work either, then UNcheck the "keep me signed in" box -- Google-help says that should allow your comment to post (it's a workaround to a current bug).
Apologies for the problems.

TIP: There's a size limit. If longer than 3500 characters or so, in a text editor, make two posts out of it.

[Valid RSS]