the AOL search history DB snafu

and why you should NOT be surprised

Wed Aug 16 19:15:24 EDT 2006


You kissed your privacy goodbye a long time ago, right?

From Wikipedia:

On August 4th, 2006, AOL released a compressed text file on one of its websites containing twenty million search keywords for over 650,000 users over a 3-month period, intended for research purposes. AOL pulled the file from public access by the 7th, but not before it had been mirrored, P2P-shared and seeded via BitTorrent. News filtered down to the blogosphere and popular tech sites such as Digg and Wired News.

Whilst none of the records on the file are personally identifiable per se, certain keywords contain personally identifiable information [1] by means of the user typing in their own name (ego-searching), as well as their address, social security number or by other means. Each user is identified on this list by a unique sequential key, which enables the compilation of a user's search history.

AOL acknowledged it was a mistake and removed the data, although the files can still be downloaded from mirror sites. Additionally, several searchable databases of the report also exist on the internet. [2]

Mistake? If betraying the trust of 2/3 of a million subscribers equals a mistake, how do they define catastrophe?

Apart from the obvious PR quagmire that AOL now finds itself in, and the painful regret (or torn anus) that AOL users may be feeling (and should have been feeling since they signed up </rant>), the long-term impact is immeasurable. Their stock is falling [3]. They're giving away BYOA accounts, [4] (they'd have to at this point), a move which may cost Time Warner over a billion dollars by 2009. [5] They're facing penalties, fines, not to mention lawsuits. [6] If there's a bottom for any business to hit, they're very close. [7]

They should take a cue from ValuJet and change their name (again). [8, 9]

AOL states they keep 30 days of user-identifiable search history, and that a research division may keep three months or more of search history, but not associated to specific accounts, (the latter echoes of what was released on 4 August). Google has already stated they will continue to store search queries and related info, and that they won't make the same mistake AOL did. [10, 11] Predictably, Yahoo! Search! will! do! the! same! Considering the staggering amount of infrastructure Google possesses, (Great Caesar's Ghost--Google has an estimated four PB of RAM alone), their data retention capabilities far exceed the 90 days of history AOL retains for research purposes. [12, 13]

That search you did recently for Paris' poodle porn may come back to haunt you. Even though you were just doing it for a friend.

Is this a watershed event? Not necessarily. This is the first time such a breach has garnered so much mainstream media attention. In comparison, remember, in just the last twelve months, laptops and similar data losses occurred at various organizations containing potentially far more sensitive data, representing over ten times the number of individuals AOL screwed. [14] One shining example comes from a group that is trying to win some kind of 'Greatest No. of Incidents' or 'Most Incompetent' Award in data loss: In May 2006, a laptop was stolen from a Veterans Affairs employee with records for more than 26 million U.S. Veterans. That's ten percent of the U.S. population as of a decade ago! That's beyond fubar, that's the MOAFU. [15, 16]

Has a precedent been set here? While this may be the first time the public has had access to such data, the U.S. government already possesses similar data for a longer period from multiple search engines. The Department of Justice hammered search companies recently, and walked away with an undisclosed amount of data from three spineless organizations which caved in, (AOL, MSN, Yahoo!). [17, 18] When the DOJ demanded data from Google, who, not surprisingly, owns the bulk of the market, but has been slowly slipping away from their "do no evil" creed, they fought back and managed to avoid the Gestapo clutches. [19, 20, 21, 22, 23]

Since the Justice Department started hounding search engines for user data, it should have become apparent to everyone, both inside and outside the industry, that search data was being collected in such a fashion that it could be mined and ultimately traced back to the user. The DOJ's possession of AOL, MSN and Yahoo! search data is only the beginning. When you consider the amount of uniquely identifiable information, even before your ISP hands over your account details, I can't count the ways we're screwed. Think about it, your IP address and everything your web browser gives up, your associated accounts (a cookie for your Google, Yahoo! or MSN account...), your browsing history (doubleclick can suck it), possibly your local user name, and of course your bloody search history. Yes, that poof sound you heard was your privacy vanishing.

And just to appease the tinfoil hat crowd, let's carry this one obvious step further: The DOJ will collect the information under the auspices of protecting children from predators, preventing terrorism, or saving the Spotted Owl. But ultimately, they'll seize information and act in a fashion which doesn't follow the spirit of the law, coming after individuals much like the bloody thought police. Orwell must be gloating.

It is impossible to predict the consequences of government and third parties data mining our search history. Furthermore, which poses the greater threat: government data mining, or malicious third parties? (Maybe one in the same). Congress' laughable renewed interest in privacy concerns and proposed limits for search engine retention is nothing more than an election-year stunt. [24]

Caveat Searcher.

What are some of the consequences for AOL? Possibly some FCC fines. Definitely some subscriber fallout, which means decreasing revenue from dial-up users bailing. The financial impact may extend to a significant stock drop for Time Warner. About time to spin-off AOL, isn't it fellas? Regardless, this will create a domino effect, as more bad publicity and further user departure causes business to suffer....

What are some of the consequences for AOL users? They'll definitely want to become part of the (inevitable) class-action suit. [25] Of course, their take in the winnings will be pitiful, and the attorneys will make out, well, like the bandits they are. AOL users will learn to search more, hmm, cautiously? Just kidding. Most of them won't learn anything from this. [26] It seems only a tiny minority of the general public is even aware of this incident. (Just because we hear a sound bite or catch a headline, doesn't equate to this moving into the forefront of the public's consciousness. What will it take? See Craver, below). Pity. They should learn to watch their asses. There may be an investigation behind the "how to kill my wife" dude. Brilliant, jackass. Just brilliant.

What are some of the consequences for non-AOL users (the rest of us)? Maybe a tiny increase in responsible behavior while online? I hope. We'll definitely continue to tell friends and neighbors to dump them and get a real service provider.

Was this The Big One? That's one of the questions answered by Scott Craver, in a post on his blog, "The 'Data Valdez' versus the Privacy Ceiling." An assistant professor at the University of Binghamton, Craver implies that, although noteworthy, the AOL disaster isn't the kind of earth-splitting, makes-Nevada-beachfront-property quake that many of us are anticipating. He goes on to describe the attributes that define the Privacy Ceiling, and what it will take to make the public stop and take notice. (The comments thread following a related post at Freedom to Tinker goes into this even further). As George Carlin would say, the AOL meltdown was a near-hit. Craver is brilliant, and this post is insightful and informative as hell. (Kudos to Jericho and Lyger for forwarding these two).

Is there anonymity on the internet? Clearly not for these AOL victims. Are your surfing from an open wifi connection using someone else's laptop while wearing rubber gloves and a false mustache? [27, 28] Keep in mind, the search engine companies aren't the only ones sorting and mining your data. Even online retailers like Amazon are jumping into this. [29] Great. One little search to see who sells, um, neck massagers, and now my Amazon homepage looks like a perv's dream.

Is your search history private? Don't be ridiculous.

Are you logged into web mail while searching for <clears throat> entertainment sites? Nice. While you're at it, why not search for my ten favorite books? Your IP address alone is enough to get the DOJ on your doorstep, though it may not be the smoking gun the RIAA hopes for [30].

An ounce of prevention.... Use some common sense and some f#cking discretion when searching on a connection that can be pointed back to you. For that matter, take a page from gun safety--treat every connection as though it can be tracked back to you. [31] Search for entertainment *cough* porn *cough* on a connection other than from home or work. Use a proxy that supports anonymous connections, onion routing, or a service like Anonymizer (not affiliated). [32] And make sure said services are outside of your native country, where the land has a history of not turning over data to your government upon demand, (ixnay on edenSway, for now). [33, 34, 35] Don't believe for a second that an internet cafe or Starbucks provides any layer of insulation. And don't think hardware or OS-specific minorities are immune, either, (Macs, Lycoris, appliances, etc.).

Solution providers: use a hosting provider outside of the U.S.

Stop searching for your personal information, dammit! No SSN's, credit card numbers, full names, unlawful activities, or bad thoughts. And sure as hell don't look up how to kill your spouse. [36] With service provider "mistakes" like this, who needs enemies? Seems like a good day to support the EFF and the ACLU. Or support Attrition. Hey, Lyger's gotta eat.

Remember to get your minority report. [37, 38]


Check out...

AOL's Data Valdez - AOL Data Valdez and the Privacy Ceiling
(Could this be...) The Exxon Valdez of Privacy
background - Adam D'Angelo
Greg Linden speaks out in defense of AOL
Declan McCullagh's collection of bizarre and disturbing search queries
Slate - The Seven Ways that People Search the Web
The Consumerist - AOL User 927...
C|Net - FAQ: Protecting yourself from search engines
AOL Search DB

Thanks to Declan McCullagh, Jericho and especially Lyger for their help & input. Lyger is currently undergoing genital shock therapy trying to get my annoying-ass questions to stop echoing in his mind. If we don't get these people to a hospital, they'll die. A hospital! What is it? A big building full of patients, but that's not important right now...



Obvious disclaimer: This article is the author's violently-biased, stinking opinion, (very) loosely based on fact. Don't be so stupid as to believe I know jack sh!t. I obviously don't, or I wouldn't be writing this, I'd be sipping margaritas in Bora Bora while my compounding interest makes Google's (frightening) stock value look like chump change. I make no warranties, assume no liability, and genuinely believe you're responsible for your own damn self. Search engines are keystroke loggers. Welcome to Earth.


main page ATTRITION feedback