Page 58 - hemispheres

Basic HTML Version

computer, provided it’s runningGoogle’s
Chrome browser.
Ironically, Cohen, who took French in
high school, has li le facility with foreign
languages (unless you count the inflec-
tions of his native Brooklynese), but he
does have an ear for music. A er graduat-
ing fromBoston’s Berklee College ofMusic
with a degree in composition, Cohen spent
seven years playing guitar with his own
sextet, and at one point even traveled to
Haiti to study voodoo rhythms.
In the end, however, his scientific side
won out. Cohen got a Ph.D. in computer
science from the other Berkeley—at the
University of California—and spent the
next two decades working in speech tech-
nology both at Stanford and at a company
of his own. “I was always partly a scientist,
partly a composer,” he says. “But as I got
interested in computation, I was drawn to
speech recognition because the same set
of cognitive questions came up.” Speech
and music, he explains, “are the two com-
plex auditory signals that humans have
evolved to communicate with.”
In 2004, it became clear to Cohen that
speech technology was going mobile.
Thoughhe thought about startinganother
firm, he ultimately went with Google,
impressed by the company’s “focus on big
data,” he says. He’s been working on its
speech recognition technology ever since.
Here’s how it works: Say you’re looking
for a place to grab dinner inL.A., but you’re
in your car and it’s unsafe to type in your
request. A er you click the “mic” bu on
on your phone, a box pops up onscreen
instructing you to “speak now.” You say
“Los Angeles restaurants” and—voilà!—
the voice search so ware transforms your
words to text and links you to any number
of relevantwebsites. (This is alsomuch the
same functionality that the iPhone’s Siri
so ware famously provides.)
But what if you’re one of the 5.5 billion
people who don’t speak English? Cohen
knew it wouldn’t be enough to offer voice
search in just one language, so in 2009 the
speech recognition team began the ardu-
ous process of bringing the tech to the rest
of the globe. You’d think they’d start with
something easy, like Spanish; “We chose
Mandarin,” Cohen says, laughing. The
idea was to get a head start on grappling
with every possible linguistic obstacle in
this notoriously complex Chinese dialect
before moving on to other languages.
“Besides, there were a couple of people in
the Beijing office really raring to go,” he
says, “and there’s nothing like amotivated
engineer to make things happen.”
The language modeling process, which
took almost a year for Mandarin Chinese,
can now be carried out in a few weeks.
Operating like a linguistic hit squad, a
team of native speakers travels to the tar-
get country armed with 30 or 40 Android
phones, which they distribute to temp
employees who take them out into the
community to record locals. A couple of
days and 250,000 or so u erances later, the
data is used to create a statistical model
that “learns” enough of the vocabulary,
grammar and syntax to be deployed on
the country’s Google search page. As word
spreads and millions of Koreans, Indians
or Russians, for example, discover they can
Google with just their voices, the model
actually starts training itself, through
what Cohen calls “unsupervised learning.”
At last count, 27 language models have
been completed, including five variants of
English and four of Spanish, along with
more exotic languages like Afrikaans,
BahasaMalay and even pig Latin, done as
a stunt for April Fool’s Day.
And it goes even deeper. “Let me demo
this for you,” Cohen says as he grabs my
iPhone (his Androidwas on the blink that
day, which shouldplease the ghost of Steve
Jobs to no end) and clicks on my Google
Translate app. “Say I want to go from
English to Spanish. And I want to do it
by talking.” Cohen says “good night” into
the phone and the text “buenas noches”
appears (or, if he had hit the mic bu on,
it would have been spoken). Selecting the
program’s conversationmodewill activate
its “turn taking” feature, so if your Spanish
comprises nothing beyond, well, “buenas
noches,” you’ll still be able to converse
with your Chilean colleague by sticking
the phone in the middle of the table and
waiting a few seconds for the device to
translate what each of you says.
The technology remains a work in
progress, but the goal is as simple as it
is ambitious: to allow people who speak
different languages to communicate seam-
lessly in real time, a sort of inverse Babel.
“The user should never have to wonder
whether they can accomplish their cur-
rent task by speaking,” Cohen says. “If
they want to speak, they should assume
they can.”
ARNIE COOPER,
a Santa Barbara, Calif.–based
writer and part-time ESL instructor, uses
Google Voice to help his students improve their
accents—driving them crazy in the process.
“I was always partly a scientist, partly a
composer,” says Google’s Mike Cohen.
“But as I got interested in computation,
I was drawn to speech recognition
because the same set of cognitive
questions came up. [Speech and music]
are the two complex auditory signals
than humans communicate with.”
58
JANUARY 2012
HEMISPHERESMAGAZINE.COM
JANUARY CROSSWORD ANSWERS
p057-058_HEM0112_Bright Ideas.indd 58
05/1