[Infowarrior] - How the NSA Converts Spoken Words Into Searchable Text

Richard Forno rforno at infowarrior.org
Tue May 5 11:30:01 CDT 2015


How the NSA Converts Spoken Words Into Searchable Text

By Dan Froomkin
@froomkin
Today at 10:08 AM

Most people realize that emails and other digital communications they once considered private can now become part of their permanent record.

But even as they increasingly use apps that understand what they say, most people don’t realize that the words they speak are not so private anymore, either.

Top-secret documents from the archive of former NSA contractor Edward Snowden show the National Security Agency can now automatically recognize the content within phone calls by creating rough transcripts and phonetic representations that can be easily searched and stored.

The documents show NSA analysts celebrating the development of what they called “Google for Voice” nearly a decade ago.

Though perfect transcription of natural conversation apparently remains the Intelligence Community’s “holy grail,” the Snowden documents describe extensive use of keyword searching as well as computer programs designed to analyze and “extract” the content of voice conversations, and even use sophisticated algorithms to flag conversations of interest.

The documents include vivid examples of the use of speech recognition in war zones like Iraq and Afghanistan, as well as in Latin America. But they leave unclear exactly how widely the spy agency uses this ability, particularly in programs that pick up considerable amounts of conversations that include people who live in or are citizens of the United States.

Spying on international telephone calls has always been a staple of NSA surveillance, but the requirement that an actual person do the listening meant it was effectively limited to a tiny percentage of the total traffic. By leveraging advances in automated speech recognition, the NSA has entered the era of bulk listening.

And this has happened with no apparent public oversight, hearings or legislative action. Congress hasn’t shown signs of even knowing that it’s going on.

The USA Freedom Act — the surveillance reform bill that Congress is currently debating — doesn’t address the topic at all. The bill would end an NSA program that does not collect voice content: the government’s bulk collection of domestic calling data, showing who called who and for how long.

Even if becomes law, the bill would leave in place a multitude of mechanisms exposed by Snowden that scoop up vast amounts of innocent people’s text and voice communications in the U.S. and across the globe.

Civil liberty experts contacted by The Intercept said the NSA’s speech-to-text capabilities are a disturbing example of the privacy invasions that are becoming possible as our analog world transitions to a digital one.

“I think people don’t understand that the economics of surveillance have totally changed,” Jennifer Granick, civil liberties director at the Stanford Center for Internet and Society, told The Intercept.

“Once you have this capability, then the question is: How will it be deployed? Can you temporarily cache all American phone calls, transcribe all the phone calls, and do text searching of the content of the calls?” she said. “It may not be what they are doing right now, but they’ll be able to do it.”

And, she asked: “How would we ever know if they change the policy?”

Indeed, NSA officials have been secretive about their ability to convert speech to text, and how widely they use it, leaving open any number of possibilities.

That secrecy is the key, Granick said. “We don’t have any idea how many innocent people are being affected, or how many of those innocent people are also Americans.”

I Can Search Against It

NSA whistleblower Thomas Drake, who was trained as a voice processing crypto-linguist and worked at the agency until 2008, told The Intercept that he saw a huge push after the September 11, 2001 terror attacks to turn the massive amounts of voice communications being collected into something more useful.

Human listening was clearly not going to be the solution. “There weren’t enough ears,” he said.

The transcripts that emerged from the new systems weren’t perfect, he said. “But even if it’s not 100 percent, I can still get a lot more information. It’s far more accessible. I can search against it.”

Converting speech to text makes it easier for the NSA to see what it has collected and stored, according to Drake. “The breakthrough was being able to do it on a vast scale,” he said.

< - >

https://firstlook.org/theintercept/2015/05/05/nsa-speech-recognition-snowden-searchable-text/

--
It's better to burn out than fade away.



More information about the Infowarrior mailing list