AOL Proudly Releases Massive Amounts of Private Data
By: Michael Arrington
Yet Another Update: AOL: .This was a screw up.
Further Update: Sometime after 7 pm the download link went down
as well, but there is at least one mirror site. AOL is in damage
control mode - the fact that they took the data down shows that
someone there had the sense to realize how destructive this
was, but it is also an admission of wrongdoing of sorts.
Either way, the data is now out there for anyone that wants to
use (or abuse) it.
Update: Sometime around 7 pm PST on Sunday, the AOL site referred
to below was taken down. The direct link to the data is still live.
A cached copy of the page is here.
AOL must have missed the uproar over the DOJ.s demand for .anonymized.
search data last year that caused all sorts of pain for Microsoft
and Google. That's the only way to explain their release of data
that includes 20 million web queries from 650,000 AOL users.
The data includes all searches from those users for a three month
period this year, as well as whether they clicked on a
result, what that result was and where it appeared on the result
page. It.s a 439 MB compressed download, expanded to just over
2 gigs. The data is available here (this link is directly to
the file) and the output is in ten text files, tab delineated.
The utter stupidity of this is staggering. AOL has released very
private data about its users without their permission. While
the AOL username has been changed to a random ID number, the
abilitiy to analyze all searches by a single user will often
lead people to easily determine who the user is, and what
they are up to. The data includes personal names, addresses,
social security numbers and everything else someone might
type into a search box.
The most serious problem is the fact that many people often
search on their own name, or those of their friends and
family, to see what information is available about them on
the net. Combine these ego searches with porn queries and
you have a serious embarrassment. Combine them with .buy
ecstasy. and you have evidence of a crime. Combine it with
an address, social security number, etc., and you have an identity
theft waiting to happen. The possibilities are endless.
Marketers are going nuts over the possibilities, users are calling for
a boycott of AOL, and others are just enraged:
User 491577 searches for .florida cna pca lakeland tampa., .emt
school training florida., .low calorie meals., .infant seat., and
.fisher price roller blades.. Among user 39509.s hundreds of searches
are: .ford 352., .oklahoma disciplined pastors., .oklahoma disciplined
doctors., .home loans., and some other personally identifying and illegal
stuff I.m going to leave out of here. Among user 545605.s searches are
.shore hills park mays landing nj., .frank william sindoni md., .ceramic
ashtrays., .transfer money to china., and .capital gains on sale of
house.. Compared to some of the data, these examples are on the safe side.
I.m leaving out the worst of it - searches for names of specific people,
addresses, telephone numbers, illegal drugs, and more. There is no question
that law enforcement, employers, or friends could figure out who some of
these people are.
There is some really scary stuff in this data.
I am assuming that AOL will take this page and the data down soon, but as of
the time of this post it has been downloaded 809 times already. People
I.ve spoken with are already building a web interface to the data. If you are
an AOL customer, I feel sorry for you.
Note that Microsoft has proposed releasing similar data to researchers,
although with an important difference - the data is not associated with
a user. Excite released data very similar to what AOL has done here,
with user associations, in 1999.
AOL is hitting bottom when it comes to brand image. This story comes on
the heels of the recorded phone call with customer service disaster as
well as a just-in story about a woman who is unable to cancel her deceased
father.s AOL account, nine months after his death.