Page 58 - hemispheres

computer, provided it’s runningGoogle’s

Chrome browser.

Ironically, Cohen, who took French in

high school, has li le facility with foreign

languages (unless you count the inflec-

tions of his native Brooklynese), but he

does have an ear for music. A er graduat-

ing fromBoston’s Berklee College ofMusic

with a degree in composition, Cohen spent

seven years playing guitar with his own

sextet, and at one point even traveled to

Haiti to study voodoo rhythms.

In the end, however, his scientific side

won out. Cohen got a Ph.D. in computer

science from the other Berkeley—at the

University of California—and spent the

next two decades working in speech tech-

nology both at Stanford and at a company

of his own. “I was always partly a scientist,

partly a composer,” he says. “But as I got

interested in computation, I was drawn to

speech recognition because the same set

of cognitive questions came up.” Speech

and music, he explains, “are the two com-

plex auditory signals that humans have

evolved to communicate with.”

In 2004, it became clear to Cohen that

speech technology was going mobile.

Thoughhe thought about startinganother

firm, he ultimately went with Google,

impressed by the company’s “focus on big

data,” he says. He’s been working on its

speech recognition technology ever since.

Here’s how it works: Say you’re looking

for a place to grab dinner inL.A., but you’re

in your car and it’s unsafe to type in your

request. A er you click the “mic” bu on

on your phone, a box pops up onscreen

instructing you to “speak now.” You say

“Los Angeles restaurants” and—voilà!—

the voice search so ware transforms your

words to text and links you to any number

of relevantwebsites. (This is alsomuch the

same functionality that the iPhone’s Siri

so ware famously provides.)

But what if you’re one of the 5.5 billion

people who don’t speak English? Cohen

knew it wouldn’t be enough to offer voice

search in just one language, so in 2009 the

speech recognition team began the ardu-

ous process of bringing the tech to the rest

of the globe. You’d think they’d start with

something easy, like Spanish; “We chose

Mandarin,” Cohen says, laughing. The

idea was to get a head start on grappling

with every possible linguistic obstacle in

this notoriously complex Chinese dialect

before moving on to other languages.

“Besides, there were a couple of people in

the Beijing office really raring to go,” he

says, “and there’s nothing like amotivated

engineer to make things happen.”

The language modeling process, which

took almost a year for Mandarin Chinese,

can now be carried out in a few weeks.

Operating like a linguistic hit squad, a

team of native speakers travels to the tar-

get country armed with 30 or 40 Android

phones, which they distribute to temp

employees who take them out into the

community to record locals. A couple of

days and 250,000 or so u erances later, the

data is used to create a statistical model

that “learns” enough of the vocabulary,

grammar and syntax to be deployed on

the country’s Google search page. As word

spreads and millions of Koreans, Indians

or Russians, for example, discover they can

Google with just their voices, the model

actually starts training itself, through

what Cohen calls “unsupervised learning.”

At last count, 27 language models have

been completed, including five variants of

English and four of Spanish, along with

more exotic languages like Afrikaans,

BahasaMalay and even pig Latin, done as

a stunt for April Fool’s Day.

And it goes even deeper. “Let me demo

this for you,” Cohen says as he grabs my

iPhone (his Androidwas on the blink that

day, which shouldplease the ghost of Steve

Jobs to no end) and clicks on my Google

Translate app. “Say I want to go from

English to Spanish. And I want to do it

by talking.” Cohen says “good night” into

the phone and the text “buenas noches”

appears (or, if he had hit the mic bu on,

it would have been spoken). Selecting the

program’s conversationmodewill activate

its “turn taking” feature, so if your Spanish

comprises nothing beyond, well, “buenas

noches,” you’ll still be able to converse

with your Chilean colleague by sticking

the phone in the middle of the table and

waiting a few seconds for the device to

translate what each of you says.

The technology remains a work in

progress, but the goal is as simple as it

is ambitious: to allow people who speak

different languages to communicate seam-

lessly in real time, a sort of inverse Babel.

“The user should never have to wonder

whether they can accomplish their cur-

rent task by speaking,” Cohen says. “If

they want to speak, they should assume

they can.”

ARNIE COOPER,

a Santa Barbara, Calif.–based

writer and part-time ESL instructor, uses

Google Voice to help his students improve their

accents—driving them crazy in the process.

“I was always partly a scientist, partly a

composer,” says Google’s Mike Cohen.

“But as I got interested in computation,

I was drawn to speech recognition

because the same set of cognitive

questions came up. [Speech and music]

are the two complex auditory signals

than humans communicate with.”

58

JANUARY 2012

•

HEMISPHERESMAGAZINE.COM

JANUARY CROSSWORD ANSWERS

p057-058_HEM0112_Bright Ideas.indd 58

05/1